its a fork
  • TypeScript 95.7%
  • Shell 2%
  • JavaScript 1%
  • HTML 0.8%
  • CSS 0.4%
  • Other 0.1%
Find a file
Gabriel e826a7bdf6
Some checks failed
CI / Install, test, build (push) Failing after 1m43s
feat(proxy): fail over when a stream stalls before the first token
Adds a pre-first-token stall watchdog to the streaming path: while no
bytes have reached the client yet (so failover is still possible), a
route that connects but produces no content/tool/reasoning delta for
STREAM_FIRST_TOKEN_STALL_MS (default 30s, env-tunable) is abandoned and
the retry loop moves to the next provider/key. Any real delta — including
reasoning traces — resets the watchdog, so a slow-but-thinking model is
never cut. After headers are flushed we can't fail over, so the existing
90s inactivity timeout still governs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 13:27:37 +02:00
.gitea/workflows docs+ci: README rewrite for v1.3, add Forgejo Actions, fix proxy dispatcher import 2026-06-14 13:40:56 +02:00
.github/workflows ci(docker): native per-arch builds, no QEMU — fixes 20-min/failed arm64 legs 2026-06-05 13:34:33 +05:00
client fix(models): render expanded stats directly under the selected row 2026-06-14 23:29:21 +02:00
desktop chore(desktop): v0.3.0 for the Premium release 2026-06-10 11:55:52 +01:00
docker feat(docker): Docker + GHCR support (adopts #44, +multi-arch & localhost-bind) (#129) 2026-05-30 21:19:24 +05:00
docs feat: Premium live catalog — signed sync, license keys, self-serve billing 2026-06-10 11:55:13 +01:00
k8s feat(ops): Phase 8 — PM2 ecosystem, k8s YAML, SETUP.md, .env.example updates 2026-06-12 21:22:17 +02:00
repo-assets docs: desktop app screenshot + Windows testers wanted note 2026-06-05 13:22:04 +05:00
scripts feat(scripts): remove country filter, output all working proxies 2026-06-13 23:38:07 +02:00
server feat(proxy): fail over when a stream stalls before the first token 2026-06-15 13:27:37 +02:00
shared feat: per-key outbound TPM + inline RPM/TPM editing on provider keys 2026-06-13 12:38:20 +02:00
.dockerignore feat(docker): Docker + GHCR support (adopts #44, +multi-arch & localhost-bind) (#129) 2026-05-30 21:19:24 +05:00
.env.example feat(ops): Phase 8 — PM2 ecosystem, k8s YAML, SETUP.md, .env.example updates 2026-06-12 21:22:17 +02:00
.gitignore chore: gitignore monetization/ (private planning) 2026-06-05 12:57:42 +05:00
docker-compose.yml feat(docker): Docker + GHCR support (adopts #44, +multi-arch & localhost-bind) (#129) 2026-05-30 21:19:24 +05:00
Dockerfile fix(docker): install build toolchain for better-sqlite3 native compile (#143) 2026-05-31 14:33:15 +05:00
ecosystem.config.cjs feat(ops): Phase 8 — PM2 ecosystem, k8s YAML, SETUP.md, .env.example updates 2026-06-12 21:22:17 +02:00
install.sh fix(auth): accept username or email on login; add migration prompt to install.sh 2026-06-13 18:28:26 +02:00
LICENSE Initial release of FreeLLMAPI 2026-04-21 20:48:54 +01:00
migrate-from-old.sh fix(migrate): replace sqlite3 CLI calls with Node.js better-sqlite3 2026-06-13 18:30:58 +02:00
package-lock.json feat(ui): add all missing frontend pages 2026-06-12 21:51:39 +02:00
package.json feat(onboarding): one-line install script + LAN dev docs (#250, #247) 2026-06-07 18:57:42 +01:00
README.md docs+ci: README rewrite for v1.3, add Forgejo Actions, fix proxy dispatcher import 2026-06-14 13:40:56 +02:00
SETUP.md feat(ops): Phase 8 — PM2 ecosystem, k8s YAML, SETUP.md, .env.example updates 2026-06-12 21:22:17 +02:00
working-proxies.txt fix(keys): properly add discovered models to fallback_config 2026-06-14 00:20:27 +02:00

FreeLLMAPI

One OpenAI-compatible endpoint. Every free LLM provider. Real-time dashboard.

Aggregate the free tiers from Google, Groq, Cerebras, NVIDIA, Mistral, OpenRouter, GitHub Models, Cohere, Cloudflare, HuggingFace, Z.ai, Ollama, Kilo, Pollinations, LLM7, OVH AI Endpoints, plus any custom OpenAI-compatible endpoint — behind a single /v1/chat/completions drop-in. A smart router picks the best available key for each request, fails over transparently when a provider rate-limits you, queues requests instead of dropping them, routes outbound traffic through an HTTP proxy pool of your choice, and tracks everything in a live WebSocket dashboard.

License: MIT PRs Welcome Docker image Release: v1.3


What's new in v1.3

  • 🛰️ HTTP proxy pool — pool many proxies, choose a strategy (round-robin, least-latency, random, or single), per-key assignment that rotates hourly, dashboard with stats, geolocation, health-checker, bulk select / enable / disable / test / delete, and a France-only find-proxies scraper. A single proxy URL set in Settings is also tracked in the dashboard.
  • 📊 Live dashboard v2 — in-flight requests now show the proxy in use, input tokens and a growing output token counter that ticks up as the stream comes back. New session totals (Tokens IN / Tokens OUT).
  • 🤖 AI Profiles v2 — profile context window is auto-set to the smallest model's window so a profile mixing 128K + 8K models never hallucinates. Round-robin API, per-profile stats panel, profile strategies (failover, round-robin, least-latency).
  • 🔑 Keys page v2 — model discovery now syncs the catalog: new models from the provider are added, stale ones are removed. Bulk auto-discover for custom endpoints, search bar, inline label/RPM/TPM editing.
  • 🌐 17 providers — Groq, Cerebras, Mistral, OpenRouter, NVIDIA NIM, GitHub Models, Cohere, Cloudflare, Google, Z.ai, HuggingFace, Ollama Cloud, Kilo Gateway, Pollinations, LLM7, OVH AI Endpoints, plus any custom OpenAI-compatible URL.
  • 🛠 Operational fixes — fixed fd-leak crash on test-all, dispatcher cache, SOCKS test endpoint, profile creation bug, model-stats scoping per user, pool-strategy persistence across restarts.
  • 🧪 505 server tests passing in 10s. Forgejo Actions workflow on every push and every v* tag.

Contents


Features

Gateway

  • Single OpenAI-compatible endpoint/v1/chat/completions, /v1/models, /v1/embeddings. Any OpenAI client library works unchanged.
  • Smart failover router — tries provider keys in scored order (success rate, latency, rate-limit headroom). Falls over to the next key/provider silently.
  • Request queueing — when all keys for a model are rate-limited, requests are held and retried instead of immediately returning 429. Configurable timeout.
  • Per-user isolation — each user has their own provider keys, gateway keys, profiles, proxies, and request history. Zero cross-tenant data leakage.
  • Gateway API keys — mint scoped keys (freellmapi-…) for your apps with per-key RPM/TPD limits. Enable/disable without deleting.
  • Outbound rate limits — set per-provider-key RPM/TPM caps enforced between the router and the upstream API.
  • Context handoff — injects a compact system message when a session switches model mid-stream so the new model knows where things left off.
  • HTTP proxy pool — distribute outbound calls across many HTTP proxies (round-robin, least-latency, random), or pin to a single proxy. Per-key hourly assignment.
  • AES-256-GCM encryption — all provider API keys are encrypted at rest. The plaintext never leaves your server.

AI Profiles

  • Virtual models — create a named profile that fans out across a list of real models. Appears in /v1/models like any other model.
  • Routing strategies — failover, round-robin, or least-latency per profile.
  • Auto context window — profile's context_window is automatically set to the smallest model's window so a profile mixing big and small models never hallucinates.
  • AUTO model — pseudo-model that picks the best available option across all your configured keys by score.
  • Embeddings parity — profiles work for embedding requests too.
  • Per-profile stats — request volume, error rate, latency, and token counts scoped to the profile.

Dashboard

  • Live WebSocket dashboard — in-flight request counter, per-request routing events (which provider/model/proxy is being tried), per-key RPM gauges, overview counters — all pushed in real time without polling.
  • In-flight proxy + tokens — active rows show the proxy name, input tokens, and a growing output token counter that ticks up live as the model streams.
  • Session token totals — Tokens IN and Tokens OUT aggregated over the dashboard session.
  • Per-user live counts — counts and events are scoped to the logged-in user.
  • Keys page — masked key display, inline label/RPM/TPM editing, health status, one-click model discovery, custom endpoint with bulk auto-discover.
  • Per-key deep stats — click any key to expand: latency percentiles (p50/p95/p99), 24-hour hourly bar chart, per-model breakdown, error breakdown, recent 25 requests, active cooldowns.
  • Live status dots — each key's indicator updates in real time when the server detects a 401/403/429. No manual refresh needed.
  • Playground — interactive chat with model selector, system prompt editor, and a live routing indicator showing which provider/model/proxy is handling the current request.
  • Profiles page — create and manage AI Profiles with a searchable model picker; add and remove models from the provider with one click.
  • Proxy pool page — bulk select, enable / disable / delete / test, per-proxy stats panel, geolocation, health checker.
  • Analytics — request volume, error rates, cost estimates, per-provider breakdown, exportable history.

Operations

  • Structured JSON logger — every request logs timestamp, level, req_id, user, key prefix, proxy, provider, and model. Secrets masked. Daily rotation with configurable retention.
  • Universal installerinstall.sh auto-detects Docker vs Node.js+PM2, generates keys, writes .env, builds, and starts the service. Prompts to migrate from an older install.
  • Migration scriptmigrate-from-old.sh finds an older FreeLLMAPI install, verifies the encryption key, and imports provider keys and request history non-destructively.
  • find-proxies script — scans a list of candidate free-proxy URLs, geo-IP-verifies the country, tests connectivity, and writes the working set to working-proxies.txt for one-click pool import.
  • Docker — multi-stage Dockerfile, docker-compose.yml, named volume for the database.
  • PM2ecosystem.config.cjs with autorestart, memory cap, and structured log rotation.
  • Kubernetes / Podmank8s/ manifests for podman play kube or kubectl apply.
  • Forgejo Actions.gitea/workflows/ci.yml and release.yml run install + build + test on every push and every v* tag.

Supported providers

Provider Models Auth
Google Gemini Gemini 2.5 Flash, 2.0 Flash, 2.5 Pro preview API key
Groq Llama 3.3/4, Qwen3, Gemma, compound-beta API key
Cerebras Qwen3 235B, Llama 3.3 70B API key
Mistral Large 3, Medium 3.5, Codestral, Devstral API key
OpenRouter 20+ free-tier models via :free routes API key
GitHub Models GPT-4.1, GPT-4o, Phi-4, Llama 3.3 GitHub PAT
Cloudflare Workers AI Kimi K2, GLM-4.7, Llama, Granite Account ID + token
Cohere Command R+, Command-A (trial key) API key
NVIDIA NIM 40 RPM free tier (eval-only ToS) API key
HuggingFace Inference router → DeepSeek, Kimi, Qwen3 API key
Z.ai (Zhipu) GLM-4.5, GLM-4.7 Flash API key
Ollama Cloud GLM-4.7, Kimi K2, Qwen3 API key
Kilo Gateway :free routes, no key required Keyless
Pollinations GPT-OSS 20B, no key required Keyless
LLM7 GPT-OSS, Llama 3.1, GLM, no key required Keyless
OVH AI Endpoints Qwen3.5 397B, Llama 3.3, no key required Keyless
Custom endpoint Any OpenAI-compatible URL — llama.cpp, LM Studio, vLLM, Ollama, Novita, etc. API key (or keyless)

Keyless providers don't require an API key. The Keys page stores a sentinel row for them so routing treats the platform as configured.


Quick start

git clone https://git.pandem.fr/outage.sh/FreeLLMapi freellmapi
cd freellmapi

# Recommended — universal installer
bash install.sh

# Or manually
cp .env.example .env
# Edit .env — set ENCRYPTION_KEY, ADMIN_USERNAME, ADMIN_PASSWORD
npm install && npm run build
node server/dist/index.js

Open http://localhost:3001 and log in with your admin credentials.


Install script

bash install.sh             # auto-detects Docker vs Node.js+PM2
bash install.sh --docker    # force Docker Compose
bash install.sh --node      # force Node.js + PM2
bash install.sh --port 8080 # custom port
bash install.sh --lan       # bind to 0.0.0.0 (LAN access)
bash install.sh --yes       # non-interactive / CI

The script detects Docker or Node.js, generates ENCRYPTION_KEY, prompts for admin credentials, writes .env, builds, starts the service, waits for the health check, and prints the access URL and management commands. After .env is written it asks if you want to migrate keys from an older install.


Docker

cp .env.example .env
# Set ENCRYPTION_KEY, ADMIN_USERNAME, ADMIN_PASSWORD in .env

docker compose up -d

# Verify
curl http://localhost:3001/api/ping

# Logs
docker compose logs -f

# Update
git pull && docker compose build && docker compose up -d

The database is stored in a named Docker volume (freellmapi-data) and survives container restarts and image rebuilds.

LAN access: set HOST_BIND=0.0.0.0 in .env to expose the port to your local network.


Node.js + PM2

Requires Node.js 20+.

cp .env.example .env   # edit before proceeding

npm install
npm run build

pm2 start ecosystem.config.cjs
pm2 save
pm2 startup            # enable autostart on reboot

# Logs
pm2 logs api-gateway

# Update
git pull && npm run build && pm2 restart api-gateway

The database lives at server/data/freeapi.db. Back it up before updates.


Migrating from an older install

If you ran a previous version of FreeLLMAPI and want to carry your provider keys forward:

# Auto-search the machine for an old install
bash migrate-from-old.sh

# Or point directly at the old repo directory
bash migrate-from-old.sh /path/to/old/freellmapi

The script:

  1. Locates the old freeapi.db and reads its ENCRYPTION_KEY
  2. Verifies decryption works before touching anything
  3. Handles ENCRYPTION_KEY mismatches — adopt the old key (recommended if the new DB is empty) or re-encrypt everything on the fly with the new key
  4. Imports all api_keys rows, skipping duplicates. Optionally imports request history.
  5. Never deletes from either database — fully non-destructive

After running, restart the server and go to Keys → Check all to confirm the imported keys are still valid with each provider.


Using the API

FreeLLMAPI speaks standard OpenAI API. Point any client at your server:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3001/v1",
    api_key="freellmapi-your-gateway-key-here",
)

response = client.chat.completions.create(
    model="auto",   # router picks best available, or name a specific model
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
curl http://localhost:3001/v1/chat/completions \
  -H "Authorization: Bearer freellmapi-your-gateway-key-here" \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","messages":[{"role":"user","content":"Hello!"}]}'

Listing models

curl http://localhost:3001/v1/models \
  -H "Authorization: Bearer freellmapi-your-gateway-key-here"

Returns all models from your configured providers plus any AI Profiles. Use auto to let the router choose, or request a specific model by name.

Gateway API keys

Create keys in the dashboard under Keys → Gateway keys. The raw key is shown once at creation — only the hash is stored.


HTTP proxy pool

v1.3 ships a full HTTP proxy pool. Configure it from the Proxies page or via env / .env:

Env var Default Description
PROXY_URL Single proxy URL. Wins over DB. e.g. http://user:pass@host:port
PROXY_POOL_STRATEGY none none (off), random, round-robin, least-latency
PROXY_BYPASS Comma-separated platform names that skip the proxy (e.g. groq,google)

Single proxy — set PROXY_URL once. The dashboard tracks request count, success rate, latency, and bytes.

Pool — add rows in the Proxies page (label, URL, optional country), pick a strategy, and the router distributes each request across the pool. Per-key assignment rotates every hour to spread load and avoid burning any single egress.

Bypass list — platforms you don't want proxied (e.g. low-latency providers like Groq that you want to reach directly).

SOCKS proxies — supported via socks5:// and socks4:// URLs.

find-proxies script — scans a candidate URL list, geo-IP-verifies the country (France by default), probes connectivity, and writes the working set to working-proxies.txt for bulk import:

# 1. Find a batch of working proxies
node scripts/find-proxies.mjs --out working-proxies.txt

# 2. Import in the dashboard: Proxies → Bulk import → paste file

Dashboard

Live dashboard (v1.3 highlights)

  • In-flight proxy name — every active row shows which proxy the request is using
  • Growing token countsoutput_tokens ticks up live as the model streams
  • Session totalsTokens IN and Tokens OUT aggregated over the dashboard session
  • Per-user counts — events scoped to the logged-in user
  • Real-time routing events — which provider/model was tried, latency, outcome, all pushed over WebSocket

Keys page

Add provider credentials for any supported service. Features per key:

  • Live status dot — updates in real time: green (healthy), amber (rate-limited), red (error/disabled/401)
  • Inline editing — pencil icon to edit label, RPM cap, TPM cap without re-entering the key
  • Stats panel — click the dot or chevron to expand deep stats:
    • Total requests, success rate, error count, rate-limit hits
    • Latency: min / p50 / avg / p95 / p99 / max
    • 24-hour bar chart, colour-coded by error rate
    • Per-model breakdown with success rate and token counts
    • Error breakdown with occurrence counts
    • Last 25 requests with model, latency, tokens, and timestamp
    • Active cooldowns with reset times
  • Model discovery — one click calls the provider's /v1/models endpoint, adds new models to the catalog, and removes stale ones. Custom endpoint mode imports all advertised models in one shot.

Custom endpoints: use the "Custom" provider to connect any OpenAI-compatible URL. Toggle "Auto-discover" to bulk-register all models the endpoint advertises.

AI Profiles

Profiles act as virtual models. Build a ranked list of real models, choose a routing strategy, and the profile appears in /v1/models like any other model.

  • Context window — auto-set to the smallest model's window. Add a 4K model to a profile that has a 128K model and the cap drops to 4K; remove it and the cap goes back up. Prevents the larger model from drifting on a too-large context.
  • Routing strategiesfailover (try in order), round-robin, least-latency
  • Search bar — filter by model name, provider, or model ID when building a profile
  • Per-profile stats — request volume, error rate, latency, and token counts

Proxies

The proxy pool page lets you:

  • Add HTTP / HTTPS / SOCKS proxies with labels and (optional) country
  • Pick a pool strategy: none, random, round-robin, least-latency
  • Bulk select with checkboxes and apply enable / disable / delete / test across many rows
  • Click a row to expand its stats panel (request count, success rate, latency, bytes)
  • Health-checker pings every enabled proxy on a schedule and marks failing ones

Playground

Interactive chat with:

  • Full model selector (providers, profiles, auto)
  • System prompt editor
  • Live routing indicator — shows the provider, model, and proxy the router is currently trying while a request is in flight

Analytics

Request history with per-provider breakdown, token usage, latency trends, and cost estimates.


Configuration

Variable Default Description
ENCRYPTION_KEY (required) 64-char hex string for AES-256-GCM. Generate: node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"
PORT 3001 HTTP listen port
HOST_BIND 127.0.0.1 Bind interface. Set 0.0.0.0 for LAN access.
ADMIN_USERNAME Bootstrap admin username (first run only)
ADMIN_PASSWORD Bootstrap admin password, min 8 chars (first run only)
PROXY_RATE_LIMIT_RPM 120 Max /v1 requests per minute per client IP. 0 = disabled.
REQUEST_ANALYTICS_RETENTION_DAYS 90 Days to keep request history
REQUEST_ANALYTICS_MAX_ROWS 100000 Maximum request history rows
REQUEST_QUEUE_TIMEOUT_SECONDS 60 Seconds to hold a queued request before returning 429
MODEL_REFRESH_INTERVAL_MINUTES Periodic live model list refresh. Unset = boot-time only.
FREELLMAPI_CONTEXT_HANDOFF Set on_model_switch to inject context messages on model switch
PROXY_URL Single proxy URL. e.g. http://user:pass@host:port
PROXY_POOL_STRATEGY none none / random / round-robin / least-latency
PROXY_BYPASS Comma-separated platform names to skip the proxy
DASHBOARD_ORIGINS Extra CORS origins for the dashboard (comma-separated)
LOG_LEVEL INFO DEBUG | INFO | WARN | ERROR | FATAL
GATEWAY_INSTANCES 1 PM2: number of gateway worker processes

How routing works

When a request arrives at /v1/chat/completions:

  1. Resolve the model — a Profile name expands to its candidate list; auto scores all available models; anything else looks up provider keys for that model ID.
  2. Score and sort — each candidate key is scored by recent success rate, average latency, and rate-limit headroom. Lower-scored candidates move to the back of the queue.
  3. Pick a proxy — for the resolved key, look up its current pool assignment (or fall back to the single PROXY_URL); the assignment rotates every hour so each key gets a fair share.
  4. Try in order — the router picks the first key that isn't on cooldown, hasn't hit its outbound RPM/TPM cap, and has a healthy proxy.
  5. On failure — a rate-limit response (429) puts the key on a cooldown; an auth failure (401/403) marks it invalid and pushes a real-time status update to the dashboard.
  6. Queue if exhausted — if all candidates are on cooldown the request parks in an in-memory queue and retries when the earliest cooldown expires, up to REQUEST_QUEUE_TIMEOUT_SECONDS.
  7. Context handoff — if the winning model differs from the previous turn in the same session, a compact system message is prepended so the new model has context.

Every routing decision is logged (provider, model, key prefix, proxy, latency) and streamed to the live dashboard via WebSocket.


Project structure

freellmapi/
├── server/src/
│   ├── routes/        REST endpoints — /api/*, /v1/*
│   │                    (proxy, keys, profiles, gateway-keys, admin, fallback, embeddings, responses, settings…)
│   ├── services/      Router, queue, health checker, scoring, WebSocket push, auth, context handoff, proxy-health
│   ├── providers/     Per-provider OpenAI-compatible adapters (google, openai-compat, cohere, cloudflare)
│   ├── db/            SQLite (better-sqlite3), migration runner, model catalog migrations
│   └── lib/           Crypto, proxy pool, logger, error handling
├── client/src/
│   ├── pages/         Keys, Profiles, Proxies, Gateway keys, Playground, Analytics, Live dashboard, Fallback, Premium, Admin…
│   └── components/    Shared UI components (shadcn/ui)
├── shared/            TypeScript types shared between server and client
├── desktop/           Electron desktop wrapper
├── scripts/           find-proxies.mjs and other ops helpers
├── k8s/               Kubernetes / Podman play kube manifests
├── .gitea/workflows/  Forgejo Actions (ci.yml, release.yml)
├── .github/workflows/ GitHub Actions (ci.yml, docker.yml)
├── ecosystem.config.cjs  PM2 process definition
├── Dockerfile         Multi-stage Docker build
├── docker-compose.yml
├── install.sh         Universal installer
├── migrate-from-old.sh  Import keys from an older FreeLLMAPI install
└── find-proxies.mjs   Free-proxy scraper / verifier

Continuous integration

Two Forgejo Actions workflows ship with the repo and are picked up automatically by your Forgejo runner:

Workflow Trigger What it does
.gitea/workflows/ci.yml push to main, pull request npm ci → build server → build client → run 505 server tests
.gitea/workflows/release.yml push of any v* tag (e.g. v1.3) same as CI, then posts a green-build summary to the run page

A push of the v1.3 tag will start a release build on the runner and report success or failure on the release page. Run the same suite locally with:

npm install
npm run build
npm test -w server

Contributing

git clone https://git.pandem.fr/outage.sh/FreeLLMapi freellmapi
cd freellmapi
npm install
npm run dev        # server on :3001, Vite dev server on :5173
  • npm run build — full production build (server + client)
  • npm test -w server — server test suite (505 tests)
  • npm run build:server — server only
  • npm run desktop:dev — Electron desktop wrapper in dev mode
  • npm run desktop:dist — build a distributable Electron binary

PRs welcome. Keep changes focused; include tests for new server behaviour.


Disclaimer

FreeLLMAPI routes requests to third-party AI provider APIs under their respective free-tier terms. It does not circumvent rate limits — it distributes load across multiple API keys you legitimately own. Review each provider's Terms of Service before use. NVIDIA's free NIM tier is for evaluation only. The authors are not responsible for ToS violations or API key misuse.


MIT License · Based on FreeLLMAPI