OneAPIforeverymodelyourstackwillneed.

The unified inference layer for GPT, Claude, Gemini, DeepSeek, Llama and 50+ models. Transparent pricing, predictable latency, one key.

All systems operational · 99.99% uptime
OpenAI

GPT-4o

Input · per 1M tokens
Official$5.00
TokenLayer$3.50-30%
Output · per 1M tokens
Official$15.00
TokenLayer$10.50-30%

Average savings: 30% across all models

8 models · Auto-rotating · Use arrows or dots to navigate

01
Edge uptime
0.00%
02
Models in catalog
0+

Capabilities

Theinfrastructurelayerbetweenyouandeverymodel.

Everything you need to ship reliable AI products — without managing six provider contracts.

01 / Unified
GPTClaudeGeminiLlama

One API. Every model.

Swap between GPT, Claude, Gemini, DeepSeek, Llama and 50+ models without rewriting a single line. OpenAI-compatible by default.

02 / Cost
before$2,840/mo
with tokenlayer$960/mo
−66% on identical workloads

Pay less for the same answer.

Smart routing picks the cheapest model that meets your quality bar.

03 / Speed

Edge-routed inference.

Global PoPs route to the fastest available provider in under 80ms.

04 / Billing
OpenAI
$412
Anthropic
$298
Google
$140
DeepSeek
$54
Meta
$32
Total
$936

One invoice, all providers.

No more juggling six vendor dashboards. Set budgets, allocate by team.

05 / Analytics

See every token.

Per-model latency, error rates and cost breakdowns. Built-in tracing.

06 / DX
$ npm i @tokenlayer/sdk
$ export TOKENLAYER_KEY=tl_***
$ node app.js

Built for developers.

First-class SDKs in TypeScript, Python, Go. Streaming, tools, structured output.

How it works

Three steps from signup to scale.

est. setup time
≈ 4 minutes
01

Create your account.

Sign up in 60 seconds. No credit card. $5 of free credits to play with every model.

signup —› tokenlayer.net/get-started
02

Grab your API key.

Generate a key, set a monthly cap, and we'll handle quota across every provider.

TOKENLAYER_KEY=tl_live_a1b2c3d4...
03

Start building.

Drop-in compatible with the OpenAI SDK. Change baseURL — keep your existing code.

client.chat.completions.create({ model: "auto" })

The model catalog

Every frontier model. Always up to date.

From the day a new model ships, it's available on TokenLayer — with the same SDK call, same billing, same observability.

OpenAI
GPT

Flagship reasoning and multimodal models.

4o4o-minio1o3
model.gpt
Anthropic
Claude

Long-context and coding-grade quality.

3.5 Sonnet3.5 Haiku3 Opus
model.claude
Google
Gemini

Native multimodal with 2M+ token context.

2.0 Pro2.0 Flash1.5 Pro
model.gemini
DeepSeek
DeepSeek

Frontier-grade reasoning at a fraction of the cost.

V3R1Coder
model.deepseek
Meta
Llama

Open-weight models optimized for speed.

3.3 70B3.2 90BGuard
model.llama
+ 50 more
And more

Every notable provider, one endpoint.

MistralQwenxAICohere
model.andmore
+ More daily

Every model. The day it ships.

We add new frontier models within hours of release, with full parity for streaming, tools and structured outputs.

Calculate your savings

See how much you'll save.

Adjust the sliders to estimate your monthly costs based on real provider pricing.

Tip: Drag the sliders or click any value to type your own
Select Model
1001.0M
10032,000
Total tokens / mo
20.0M
OpenAI
$5/$15 / 1M
Your monthly cost
Direct from provider
$180
With TokenLayer-80%
$36.00
You save / month
$144
You save / year
$1.73k
Start saving today

Estimates based on 60/40 input/output split. Prices from pricepertoken.com

Developer experience

Fiveminutestoyourfirsttoken.

OpenAI-compatible API, first-class SDKs, native streaming, structured outputs, tools, and observability. The boring parts, done right.

  • Drop-in replacement for OpenAI SDK
  • Streaming + structured outputs out of the box
  • Automatic retries, fallback chains, request signing
  • Per-request tracing with full payloads
app.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// One client, any model.
import { TokenLayer } from "@tokenlayer/sdk";

const tokenlayer = new TokenLayer({
  apiKey: process.env.TOKENLAYER_KEY,
});

const res = await tokenlayer.chat.completions.create({
  model: "auto",          // routes to cheapest model
  messages: [{ role: "user", content: "Summarize Q3." }],
  fallback: ["claude-3.5-sonnet", "gpt-4o"],
  stream: true,
});

for await (const chunk of res) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
● 200 OK
642ms · 1,284 tok

Pricing

Transparent.Nosurprises.

Pay only for what your models consume — plus a flat per-seat fee. No markups on tokens, ever.

Hobby

For tinkering & weekend projects.

$0/ month
Start free
  • $5 free credits
  • All models, pay-as-you-go
  • 1 project, 1 API key
  • Community support
POPULAR

Pro

For startups shipping in production.

$23/ seat / month
Start 14-day trial
  • Unlimited projects & keys
  • Smart routing + fallback
  • Team analytics & budgets
  • Priority routing & SLA
  • Email + Slack support

Scale

For teams running mission-critical AI.

Custom
Talk to sales
  • Volume discounts
  • Dedicated routing tier
  • BYO keys + private deploy
  • SOC 2, HIPAA, BAA
  • Dedicated CSM

Token costs passed through at provider list price. See full pricing →

What you can do

Buildmore.Spendless.Sleepbetter.

Concrete outcomes you can expect after pointing your stack at TokenLayer.

Avg. 41% cost reduction in week one
Save
before
$2,840
with tokenlayer
$960
−66%

Cut your AI spend by up to 70%.

Smart routing picks the cheapest model that hits your quality bar. Most teams save $1,800+ per developer per month.

Compatible
api.openai.com
api.tokenlayer.net
// every other line stays the same

OpenAI & Anthropic SDK compatible.

Drop-in replacement. Change one line — your base URL — and keep every line of code you already shipped.

Works with
CursorClaude CodeContinueAider

Cursor, Claude Code, Continue, Aider.

Use TokenLayer as the inference layer for your favorite AI coding tools. One key powers your whole stack.

Failover
GPTDOWNClaudeOK200

Stay up when providers don't.

Define a fallback chain. If OpenAI is down, requests transparently route to Claude. No incidents, no dropped users.

Ship faster
GPT-4o
78%
Claude 3.5
92%
Gemini 2.0
71%

Compare models in production.

A/B test GPT-4o vs Claude 3.5 vs Gemini side-by-side on real traffic. Pick the winner from real data, not vibes.

Predictable
monthly cap · team-frontend$642 / $1,000
alert at 80% · auto-pause at 100%

Budgets, caps, and rate limits.

Set per-key, per-team, per-model monthly limits. Get alerts before you blow through them.

Questions

Common questions, honest answers.

Still curious? Our team responds to every email within a business day.

Yes. Point your base URL at api.tokenlayer.net/v1 and use any model name. Our endpoints are 100% compatible with the OpenAI Chat Completions and Responses APIs — including streaming, tool calling, and structured outputs.

Ready when you are

Buildwitheverymodelbeforelunch.

$5 of credits when you sign up. No credit card. Cancel any time.

SOC 2 Type IIGDPR · HIPAA · BAA99.99% Uptime SLA14 global PoPs