AI API Pricing: A 2025 Guide to Tokens, Models, and Costs

0:00

Demystifying AI API Pricing in 2025: Tokens, Models, and Costs

Posted on September 9, 2025 by AI Insights Blog

API access to AI models powers everything from apps to enterprise systems, but pricing can be complex with token-based billing. In this post, we cover the latest AI API pricing (non-chat plans), sorted by input cost per million tokens from low to high. For ties, the most recent models come first.

Data is up-to-date as of September 9, 2025. We've included recent stealth models from OpenRouter (like Sonoma and Horizon series) in the free tier, as they're in alpha/beta testing. For subscription-based chat plans, see our other post on AI Chat Subscription Plans in 2025.

We've added notes on OpenAI's Realtime API updates from August 28, 2025, which introduce advanced speech-to-speech capabilities, MCP server support, image input, and SIP phone calling—potentially impacting API usage in voice agents (reflected in relevant comments where applicable).

AI API Pricing (Non-AI Chat Plans) - FREE

Free APIs are great for testing, often with logging for feedback.

Provider	Model	Input $/M Tokens	Output $/M Tokens	Comments (Pricing Details, Context, Notes)	Recency
xAI	grok code 1	$0.00	$0.00	Free via launch partners through noon PT on Sep 10, 2025; then $0.20/M input and $1.50/M output apply.	September 9, 2025
OpenRouter	Sonoma Sky Alpha	$0.00	$0.00	Cloaked “stealth” frontier model; 2,000,000-token context, supports image inputs and parallel tool calling; free during testing; prompts and completions logged for feedback. vision + tools; free test with logging.	September 9, 2025
OpenRouter	Sonoma Dusk Alpha	$0.00	$0.00	Cloaked “stealth” frontier model optimized for speed; 2,000,000-token context, supports image inputs and parallel tool calling; free during testing; prompts and completions logged for feedback. (faster); vision + tools; free test with logging.	September 9, 2025
OpenRouter	Horizon Beta	$0.00	$0.00	256K-token context cloaked model; improved successor to Horizon Alpha; free during testing; prompts/completions logged.	September 9, 2025
OpenRouter	Horizon Alpha	$0.00	$0.00	256K-token context early cloaked model; deprecated in favor of Horizon Beta.	September 9, 2025
OpenRouter	Cypher Alpha	$0.00	$0.00	Cloaked all-purpose model; 1,000,000-token context; free testing; logging enabled by provider.	September 9, 2025

AI API Pricing (Non-AI Chat Plans) - PAID

Paid APIs scale with usage, often with caching discounts. Note: OpenAI's Realtime API (August 2025 update) adds speech-to-speech and voice agent features, billed at model rates—check for integration in models like GPT-4o.

Provider	Model	Input $/M Tokens	Output $/M Tokens	Comments (Pricing Details, Context, Notes)	Recency
Zhipu AI	GLM-4.5-airx	$0.02	$0.06	Ultra-fast, lowest cost model.	September 9, 2025
OpenAI	GPT-5 nano	$0.05	$0.40	Entry-level GPT-5 model; 400K input/128K output context.	September 9, 2025
Meta	Llama 3.2 11b Vision (via Deepinfra)	$0.055	$0.055	128K context, 2K max output, vision capabilities.	September 9, 2025
Alibaba	Qwen-Turbo	$0.0525	$0.21	1M context; extremely low-cost; per 1K: $0.00005/$0.0002.	September 8, 2025
DeepSeek	deepseek-reasoner	$0.07	$1.68	Input pricing is for cache hits; cache miss is $0.56/M.	September 8, 2025
DeepSeek	deepseek-chat	$0.07	$1.68	Input pricing is for cache hits; cache miss is $0.56/M. Off-peak discounts ended Sep 5, 2025.	September 8, 2025
OpenAI	GPT-4.1 nano	$0.10	$0.40	Lightweight GPT-4.1 variant; 1M input/32K output context.	September 9, 2025
Google	2.5 Flash-Lite	$0.10 - $0.30	$0.40	Economical. Caching (Input $/M): $0.025-$0.125.	September 8, 2025
Google	2.0 Flash	$0.10 - $0.70	$0.40	Text/img/video/audio. Live API: $0.35-$2.10 in / $1.50-$8.50 out. Caching (Input $/M): $0.025-$0.175. Batch (I/O $/M): $0.05 / $0.20.	September 8, 2025
OpenAI	GPT-4o Mini	$0.15	$0.60	Cost-effective and fast option for high-volume tasks.	September 9, 2025
Zhipu AI	GLM-4.5-air	$0.16	$1.07	Cost-effective lightweight model.	September 9, 2025
Meta	Llama 3.3 70b (via Deepinfra)	$0.23	$0.40	128K context, 2K max output.	September 9, 2025
OpenAI	GPT-5 mini	$0.25	$2.00	Mid-tier GPT-5 model; 400K input/128K output context.	September 9, 2025
Anthropic	Claude 3 Haiku	$0.25	$1.25	Most economical; fastest and most compact. Prompt Caching (Write/Read $/M): $0.30 / $0.03. Batch (I/O $/M): $0.125 / $0.625.	September 9, 2025
DeepSeek	DeepSeek-V3	$0.27	$1.10	64K context, 8K max output.	September 9, 2025
Alibaba	QVQ-72B-Preview	$0.2756	$0.5513	Preview version; up to 97% price cuts reported.	September 8, 2025
xAI	grok-3-mini	$0.30	$0.50	Budget model; 8 rps. Cached Input $/M: $0.075. Live Search: $25/1K Sources.	September 8, 2025
Google	2.5 Flash	$0.30 - $1.00	$2.50	Text/img/video/audio. Live API: $0.50-$3.00 in / $2.00-$12.00 out. Caching (Input $/M): $0.075-$0.25.	September 8, 2025
Alibaba	Qwen2.5 72B/7B	$0.30 - $0.35	$0.40	Large/small options; ~$0.0004 per 1K for Plus equivalents.	September 8, 2025
Zhipu AI	GLM-4.5	$0.33	$1.32	131K context. Zhipu is actively promoting migration for former Claude users, offering 20 million free tokens.	September 9, 2025
Meta	Llama 3.2 90b Vision (via Deepinfra)	$0.35	$0.40	128K context, 2K max output, vision capabilities.	September 9, 2025
OpenAI	GPT-4.1 mini	$0.40	$1.60	Smaller GPT-4.1 variant; 1M input/32K output context.	September 9, 2025
Alibaba	Qwen-VL-Max	$0.41	Similar to Input	Vision model. Price was listed as "$0.41 per 1K" which is assumed to be a typo for "$0.41 per 1M tokens".	September 8, 2025
Alibaba	Qwen-Plus	$0.42	$1.26	131K context; tiered increases >128K; non-thinking ≤128K: $0.115/$0.287.	September 8, 2025
OpenAI	GPT-3.5 Turbo	$0.50	$1.50	16K context, 4K max output.	September 9, 2025
Zhipu AI	GLM-4.5v	$0.50	$1.80	Vision model with 65.5K context.	September 9, 2025
DeepSeek	DeepSeek-R1	$0.55	$2.19	64K context, 8K max output.	September 9, 2025
Meta	Llama 3 70b (via Deepinfra/Groq)	$0.59	$0.79	8K context, 2K max output.	September 9, 2025
xAI	grok-3-mini-fast	$0.60	$4.00	Faster budget; 3 rps. Cached Input $/M: $0.15. Live Search: $25/1K Sources.	September 8, 2025
Alibaba	Qwen3 235B Thinking	$0.735	$8.82	Thinking mode; flagship 235B; $0.70/$2.80 non-thinking.	September 8, 2025
Anthropic	Claude 3.5 Haiku	$0.80	$4.00	Economical; near-instantaneous responses. Prompt Caching (Write/Read $/M): $1.00 / $0.08. Batch (I/O $/M): $0.40 / $2.00.	September 9, 2025
Alibaba	Qwen3-Max-Preview (New)	$0.861+	$3.441+	Tiered pricing based on context (0-32K, 32K-128K, 128K-252K). New 1T+ parameter model; 262K context.	September 6, 2025
Perplexity	Sonar	$1.00	$1.00	Lightweight, cost-effective; quick facts, news.	September 8, 2025
Perplexity	Sonar Reasoning	$1.00	$5.00	Quick problem-solving with step-by-step logic.	September 8, 2025
OpenAI	o1-mini	$1.10	$4.40	128K context, 65K max output, reasoning-optimized.	September 9, 2025
OpenAI	o3-mini	$1.10	$4.40	200K context, 100K max output.	September 9, 2025
OpenAI	o4-mini	$1.10	$4.40	200K context, 100K max output.	September 9, 2025
OpenAI	GPT-5	$1.25	$10.00	Latest GPT-5 model; 400K input/128K output context.	September 9, 2025
Google	2.5 Pro (≤200K)	$1.25	$10.00	Text/img/video; standard context. Caching (Input $/M): $0.31. Batch (I/O $/M): $0.625 / $5.00.	September 8, 2025
Google	1.5 Pro	$1.25 / $2.50	$5.00 / $10.00	Previous generation. Pricing is for ≤128K / >128K context. Caching (Input $/M): $0.3125 / $0.625.	September 8, 2025
Alibaba	Qwen2.5-Max	$1.60	$6.40	32K context, 8K max output.	September 9, 2025
Alibaba	Qwen-Max	$1.68	$6.72	32K context; Qwen-Max-2025-01-25 version; free 1M tokens for new users (180 days validity).	September 8, 2025
Meta	Llama 3.1 405b (via Deepinfra)	$1.79	$1.79	128K context, 2K max output.	September 9, 2025
OpenAI	GPT-4.1	$2.00	$8.00	1M input/32K output context.	September 9, 2025
OpenAI	o3	$2.00	$8.00	200K context, 100K max output (alternate pricing from search result 4).	September 9, 2025
xAI	grok-2	$2.00	$10.00	General; 15 rps; reasoning tokens as completion.	September 8, 2025
xAI	grok-2-vision	$2.00	$10.00	Vision capabilities; 10 rps.	September 8, 2025
Perplexity	Sonar Reasoning Pro	$2.00	$8.00	Enhanced multi-step reasoning with web search.	September 8, 2025
Perplexity	Sonar Deep Research	$2.00	$8.00	Exhaustive research; detailed reports. Citation: $2.00/M, Search: $5.00/1K, Reasoning: $3.00/M.	September 8, 2025
OpenAI	GPT-4o	$2.50	$10.00	128K context, 16K max output (alternate pricing from search result 3). Supports Realtime API features like speech-to-speech and image input (Aug 2025 update).	September 9, 2025
Google	2.5 Pro (>200K)	$2.50	$15.00	Large context. Caching (Input $/M): $0.625. Batch (I/O $/M): $1.25 / $7.50.	September 8, 2025
Anthropic	Claude 3.7 Sonnet	$3.00	$15.00	Blends fast responses with deeper reasoning. Prompt Caching (Write/Read $/M): $3.75 / $0.30. Batch (I/O $/M): $1.50 / $7.50.	September 9, 2025
Anthropic	Claude 3.5 Sonnet	$3.00	$15.00	25% price reduction in 2025; balanced speed/intelligence. Prompt Caching (Write/Read $/M): $3.75 / $0.30. Batch (I/O $/M): $1.50 / $7.50.	September 9, 2025
xAI	Grok-3-Preview	$3.00	$15.00	131K context.	September 9, 2025
xAI	Grok 4	$3.00	$15.00	1M input/256K output context.	September 9, 2025
Cohere	Command-R+	$3.00	$15.00	Enterprise-grade model.	September 9, 2025
Anthropic	Claude 4 Sonnet	$3.00	$15.00	Enhanced coding/reasoning; 1M token context for tiered orgs. Prompt Caching (Write/Read $/M): $3.75 / $0.30. Batch (I/O $/M): $1.50 / $7.50.	September 9, 2025
Anthropic	Claude 3 Sonnet	$3.00	$15.00	Previous generation balanced model. Prompt Caching (Write/Read $/M): $3.75 / $0.30. Batch (I/O $/M): $1.50 / $7.50.	September 9, 2025
Perplexity	Sonar Pro	$3.00	$15.00	Advanced search; deeper content understanding; 2x citations vs base.	September 8, 2025
xAI	grok-3	$3.00	$15.00	Standard; 10 rps. Cached Input $/M: $0.75. Live Search: $25/1K Sources.	September 8, 2025
Zhipu AI	GLM-4.5-flash	$3.20	$12.80	Strong reasoning capabilities.	September 9, 2025
OpenAI	GPT-4o (standard pricing)	$5.00	$15.00	Powerful and efficient flagship model for general-purpose tasks.	September 9, 2025
xAI	grok-3-fast	$5.00	$25.00	Fast standard; 10 rps. Cached Input $/M: $1.25. Live Search: $25/1K Sources.	September 8, 2025
OpenAI	GPT-4 Turbo	$10.00	$30.00	Previous generation model, still capable for many applications.	September 9, 2025
OpenAI	o3 (higher tier)	$10.00	$40.00	200K context, 100K max output (from search result 3).	September 9, 2025
OpenAI	o1-preview	$15.00	$60.00	128K context, 32K max output.	September 9, 2025
OpenAI	o1-2024-12-17	$15.00	$60.00	200K context, 100K max output.	September 9, 2025
OpenAI	o1	$15.00	$60.00	200K context, 100K max output.	September 9, 2025
Anthropic	Claude 4.1 Opus	$15.00	$75.00	Latest Opus with enhanced capabilities. Prompt Caching (Write/Read $/M): $18.75 / $1.50. Batch (I/O $/M): $7.50 / $37.50.	September 9, 2025
Anthropic	Claude 4 Opus	$15.00	$75.00	Most powerful; best for highly complex research/analysis. Prompt Caching (Write/Read $/M): $18.75 / $1.50. Batch (I/O $/M): $7.50 / $37.50.	September 9, 2025
Anthropic	Claude 3 Opus	$15.00	$75.00	Previous flagship for complex tasks. Prompt Caching (Write/Read $/M): $18.75 / $1.50. Batch (I/O $/M): $7.50 / $37.50.	September 9, 2025
OpenAI	o3‑pro	$20.00	$80.00	200K context, 100K max output, professional tier.	September 9, 2025
OpenAI	GPT-4	$30.00	$60.00	8K context, 8K max output.	September 9, 2025
OpenAI	GPT-4.5	$75.00	$150.00	128K context, 16K max output, preview model.	September 9, 2025
Alibaba	Qwen3-Coder-Plus	Standard	Standard	Limited-time offer from Jul 24, 2025; 25% off with cache hit. Pricing not specified.	September 8, 2025
OpenAI	GPT-5 Series	Premium	Premium	Newer, more advanced models with premium pricing for cutting-edge tasks.	September 9, 2025

Platform Fees (OpenRouter)

OpenRouter aggregates models with these fees:

Service/Feature	Cost/Details	Comments (Changes, Notes)	Recency
Zero-Completion Insurance	$0 (automatic)	Protection for zero-token or errored responses; automatically applied to all accounts.	September 9, 2025
Sonoma Alpha Models	Free during alpha (2 M context)	Sonoma Sky/Dusk Alpha are cloaked models; free test access; logging enabled by creator.	September 9, 2025
Horizon Alpha/Beta Models	Free during beta (256 K context)	Horizon Alpha deprecated; Horizon Beta active; both were free test models; logging noted on model pages.	September 9, 2025
Credit Purchase Fee	5.5 % (min $0.80)	Fee when purchasing credits via Stripe; crypto payments fee is 5%.	September 9, 2025
BYOK Fee	5 % of provider cost	Charged against credits when routing with your own provider key(s).	September 9, 2025
Platform Fee on Inference	None (pass-through pricing)	OpenRouter passes through provider pricing without markup on inference; no additional per-inference commission charged by OpenRouter.	September 9, 2025

Acronyms and Terms

To make sense of the terminology, here's a handy reference table:

Acronym/Term	Full Name/Description	Context/Usage in AI Plans
API	Application Programming Interface	Programmatic access to models (e.g., console.anthropic.com for Claude)
Annual	Yearly Billing	Discounted pricing for annual commitments
BYOK	Bring Your Own Key	Use personal API keys on platforms like OpenRouter while paying platform fees
Cloaked Model	A pre-release or experimental model from an undisclosed creator, often for testing purposes.	Models like Sonoma Sky/Dusk Alpha on OpenRouter are released as "cloaked" to gather feedback before a potential public launch.
CoT	Chain of Thought	Step-by-step reasoning in models (e.g., GPT-5 Thinking)
Ctx	Context	Token window size for inputs (e.g., 131K for GLM-4.5)
GDPR	General Data Protection Regulation	EU privacy law compliance in plans
HIPAA	Health Insurance Portability and Accountability Act	US healthcare privacy compliance
I/O	Input/Output	Token pricing separation (e.g., input vs. output rates)
IDE	Integrated Development Environment	Code development tools (e.g., Claude Code in terminal)
INR	Indian Rupee	Currency for regional pricing (e.g., ChatGPT Go)
LLM	Large Language Model	Core AI models (e.g., Claude, GPT) trained on text data
M Tokens	Million Tokens	Usage/pricing unit (~750 words = 1,000 tokens)
MoE	Mixture of Experts	Architecture in some models (e.g., implied in advanced tiers)
Monthly	Monthly Billing	Standard month-to-month pricing
ROI	Return on Investment	Business value metric for premium plans
RPS	Requests Per Second	API rate limits (e.g., 10 rps for grok-3)
SAML	Security Assertion Markup Language	Authentication standard (e.g., ChatGPT Business)
SCIM	System for Cross-domain Identity Management	Automates user provisioning in enterprise plans
SLA	Service Level Agreement	Uptime/performance guarantees in Enterprise plans
SOC 2	Service Organization Control 2	Security compliance framework (e.g., for enterprise data)
SSO	Single Sign-On	Enterprise login for multiple services (e.g., in Claude/ChatGPT Enterprise)
Stealth Model	An experimental or pre-release model from an undisclosed provider, released anonymously for testing	OpenRouter partners with major AI companies to release "testing versions" for feedback collection
TTL	Time To Live	Cache duration (e.g., in prompt caching)
USD	United States Dollar	Primary currency for pricing
VLM	Vision Language Model	Models handling text + images (e.g., GLM-4.5v, grok-2-vision)
ZCI	Zero Completion Insurance	OpenRouter feature that waives charges for zero-token/error responses.

Wrapping Up

API pricing is token-driven, so optimize for caching to save costs. With updates like OpenAI's Realtime API, voice and multimodal features are becoming more accessible. Stay tuned for more innovations!

Demystifying AI API Pricing in 2025: Tokens, Models, and Costs

Posted on September 9, 2025 by AI Insights Blog

AI API Pricing (Non-AI Chat Plans) - FREE

Free APIs are great for testing, often with logging for feedback.

Provider

Model

Input $/M Tokens

Output $/M Tokens

Comments (Pricing Details, Context, Notes)

Recency

xAI

grok code 1

$0.00

Free via launch partners through noon PT on Sep 10, 2025; then $0.20/M input and $1.50/M output apply.

September 9, 2025

OpenRouter

Sonoma Sky Alpha

$0.00

Cloaked “stealth” frontier model; 2,000,000-token context, supports image inputs and parallel tool calling; free during testing; prompts and completions logged for feedback. vision + tools; free test with logging.

September 9, 2025

OpenRouter

Sonoma Dusk Alpha

$0.00

Cloaked “stealth” frontier model optimized for speed; 2,000,000-token context, supports image inputs and parallel tool calling; free during testing; prompts and completions logged for feedback. (faster); vision + tools; free test with logging.

September 9, 2025

OpenRouter

Horizon Beta

$0.00

256K-token context cloaked model; improved successor to Horizon Alpha; free during testing; prompts/completions logged.

September 9, 2025

OpenRouter

Horizon Alpha

$0.00

256K-token context early cloaked model; deprecated in favor of Horizon Beta.

September 9, 2025

OpenRouter

Cypher Alpha

$0.00

Cloaked all-purpose model; 1,000,000-token context; free testing; logging enabled by provider.

September 9, 2025

AI API Pricing (Non-AI Chat Plans) - PAID

Provider

Model

Input $/M Tokens

Output $/M Tokens

Comments (Pricing Details, Context, Notes)

Recency

Zhipu AI

GLM-4.5-airx

$0.02

$0.06

Ultra-fast, lowest cost model.

September 9, 2025

OpenAI

GPT-5 nano

$0.05

$0.40

Entry-level GPT-5 model; 400K input/128K output context.

September 9, 2025

Meta

Llama 3.2 11b Vision (via Deepinfra)

$0.055

128K context, 2K max output, vision capabilities.

September 9, 2025

Alibaba

Qwen-Turbo

$0.0525

$0.21

1M context; extremely low-cost; per 1K: $0.00005/$0.0002.

September 8, 2025

DeepSeek

deepseek-reasoner

$0.07

$1.68

Input pricing is for cache hits; cache miss is $0.56/M.

September 8, 2025

DeepSeek

deepseek-chat

$0.07

$1.68

Input pricing is for cache hits; cache miss is $0.56/M. Off-peak discounts ended Sep 5, 2025.

September 8, 2025

OpenAI

GPT-4.1 nano

$0.10

$0.40

Lightweight GPT-4.1 variant; 1M input/32K output context.

September 9, 2025

Google

2.5 Flash-Lite

$0.10 - $0.30

$0.40

Economical. Caching (Input $/M): $0.025-$0.125.

September 8, 2025

Google

2.0 Flash

$0.10 - $0.70

$0.40

Text/img/video/audio. Live API: $0.35-$2.10 in / $1.50-$8.50 out. Caching (Input $/M): $0.025-$0.175. Batch (I/O $/M): $0.05 / $0.20.

September 8, 2025

OpenAI

GPT-4o Mini

$0.15

$0.60

Cost-effective and fast option for high-volume tasks.

September 9, 2025

Zhipu AI

GLM-4.5-air

$0.16

$1.07

Cost-effective lightweight model.

September 9, 2025

Meta

Llama 3.3 70b (via Deepinfra)

$0.23

$0.40

128K context, 2K max output.

September 9, 2025

OpenAI

GPT-5 mini

$0.25

$2.00

Mid-tier GPT-5 model; 400K input/128K output context.

September 9, 2025

Anthropic

Claude 3 Haiku

$0.25

$1.25

Most economical; fastest and most compact. Prompt Caching (Write/Read $/M): $0.30 / $0.03. Batch (I/O $/M): $0.125 / $0.625.

September 9, 2025

DeepSeek

DeepSeek-V3

$0.27

$1.10

64K context, 8K max output.

September 9, 2025

Alibaba

QVQ-72B-Preview

$0.2756

$0.5513

Preview version; up to 97% price cuts reported.

September 8, 2025

xAI

grok-3-mini

$0.30

$0.50

Budget model; 8 rps. Cached Input $/M: $0.075. Live Search: $25/1K Sources.

September 8, 2025

Google

2.5 Flash

$0.30 - $1.00

$2.50

Text/img/video/audio. Live API: $0.50-$3.00 in / $2.00-$12.00 out. Caching (Input $/M): $0.075-$0.25.

September 8, 2025

Alibaba

Qwen2.5 72B/7B

$0.30 - $0.35

$0.40

Large/small options; ~$0.0004 per 1K for Plus equivalents.

September 8, 2025

Zhipu AI

GLM-4.5

$0.33

$1.32

131K context. Zhipu is actively promoting migration for former Claude users, offering 20 million free tokens.

September 9, 2025

Meta

Llama 3.2 90b Vision (via Deepinfra)

$0.35

$0.40

128K context, 2K max output, vision capabilities.

September 9, 2025

OpenAI

GPT-4.1 mini

$0.40

$1.60

Smaller GPT-4.1 variant; 1M input/32K output context.

September 9, 2025

Alibaba

Qwen-VL-Max

$0.41

Similar to Input

Vision model. Price was listed as "$0.41 per 1K" which is assumed to be a typo for "$0.41 per 1M tokens".

September 8, 2025

Alibaba

Qwen-Plus

$0.42

$1.26

131K context; tiered increases >128K; non-thinking ≤128K: $0.115/$0.287.

September 8, 2025

OpenAI

GPT-3.5 Turbo

$0.50

$1.50

16K context, 4K max output.

September 9, 2025

Zhipu AI

GLM-4.5v

$0.50

$1.80

Vision model with 65.5K context.

September 9, 2025

DeepSeek

DeepSeek-R1

$0.55

$2.19

64K context, 8K max output.

September 9, 2025

Meta

Llama 3 70b (via Deepinfra/Groq)

$0.59

$0.79

8K context, 2K max output.

September 9, 2025

xAI

grok-3-mini-fast

$0.60

$4.00

Faster budget; 3 rps. Cached Input $/M: $0.15. Live Search: $25/1K Sources.

September 8, 2025

Alibaba

Qwen3 235B Thinking

$0.735

$8.82

Thinking mode; flagship 235B; $0.70/$2.80 non-thinking.

September 8, 2025

Anthropic

Claude 3.5 Haiku

$0.80

$4.00

Economical; near-instantaneous responses. Prompt Caching (Write/Read $/M): $1.00 / $0.08. Batch (I/O $/M): $0.40 / $2.00.

September 9, 2025

Alibaba

Qwen3-Max-Preview (New)

$0.861+

$3.441+

Tiered pricing based on context (0-32K, 32K-128K, 128K-252K). New 1T+ parameter model; 262K context.

September 6, 2025

Perplexity

Sonar

$1.00

Lightweight, cost-effective; quick facts, news.

September 8, 2025

Perplexity

Sonar Reasoning

$1.00

$5.00

Quick problem-solving with step-by-step logic.

September 8, 2025

OpenAI

o1-mini

$1.10

$4.40

128K context, 65K max output, reasoning-optimized.

September 9, 2025

OpenAI

o3-mini

$1.10

$4.40

200K context, 100K max output.

September 9, 2025

OpenAI

o4-mini

$1.10

$4.40

200K context, 100K max output.

September 9, 2025

OpenAI

GPT-5

$1.25

$10.00

Latest GPT-5 model; 400K input/128K output context.

September 9, 2025

Google

2.5 Pro (≤200K)

$1.25

$10.00

Text/img/video; standard context. Caching (Input $/M): $0.31. Batch (I/O $/M): $0.625 / $5.00.

September 8, 2025

Google

1.5 Pro

$1.25 / $2.50

$5.00 / $10.00

Previous generation. Pricing is for ≤128K / >128K context. Caching (Input $/M): $0.3125 / $0.625.

September 8, 2025

Alibaba

Qwen2.5-Max

$1.60

$6.40

32K context, 8K max output.

September 9, 2025

Alibaba

Qwen-Max

$1.68

$6.72

32K context; Qwen-Max-2025-01-25 version; free 1M tokens for new users (180 days validity).

September 8, 2025

Meta

Llama 3.1 405b (via Deepinfra)

$1.79

128K context, 2K max output.

September 9, 2025

OpenAI

GPT-4.1

$2.00

$8.00

1M input/32K output context.

September 9, 2025

OpenAI

$2.00

$8.00

200K context, 100K max output (alternate pricing from search result 4).

September 9, 2025

xAI

grok-2

$2.00

$10.00

General; 15 rps; reasoning tokens as completion.

September 8, 2025

xAI

grok-2-vision

$2.00

$10.00

Vision capabilities; 10 rps.

September 8, 2025

Perplexity

Sonar Reasoning Pro

$2.00

$8.00

Enhanced multi-step reasoning with web search.

September 8, 2025

Perplexity

Sonar Deep Research

$2.00

$8.00

Exhaustive research; detailed reports. Citation: $2.00/M, Search: $5.00/1K, Reasoning: $3.00/M.

September 8, 2025

OpenAI

GPT-4o

$2.50

$10.00

128K context, 16K max output (alternate pricing from search result 3). Supports Realtime API features like speech-to-speech and image input (Aug 2025 update).

September 9, 2025

Google

2.5 Pro (>200K)

$2.50

$15.00

Large context. Caching (Input $/M): $0.625. Batch (I/O $/M): $1.25 / $7.50.

September 8, 2025

Anthropic

Claude 3.7 Sonnet

$3.00

$15.00

Blends fast responses with deeper reasoning. Prompt Caching (Write/Read $/M): $3.75 / $0.30. Batch (I/O $/M): $1.50 / $7.50.

September 9, 2025

Anthropic

Claude 3.5 Sonnet

$3.00

$15.00

25% price reduction in 2025; balanced speed/intelligence. Prompt Caching (Write/Read $/M): $3.75 / $0.30. Batch (I/O $/M): $1.50 / $7.50.

September 9, 2025

xAI

Grok-3-Preview

$3.00

$15.00

131K context.

September 9, 2025

xAI

Grok 4

$3.00

$15.00

1M input/256K output context.

September 9, 2025

Cohere

Command-R+

$3.00

$15.00

Enterprise-grade model.

September 9, 2025

Anthropic

Claude 4 Sonnet

$3.00

$15.00

Enhanced coding/reasoning; 1M token context for tiered orgs. Prompt Caching (Write/Read $/M): $3.75 / $0.30. Batch (I/O $/M): $1.50 / $7.50.

September 9, 2025

Anthropic

Claude 3 Sonnet

$3.00

$15.00

Previous generation balanced model. Prompt Caching (Write/Read $/M): $3.75 / $0.30. Batch (I/O $/M): $1.50 / $7.50.

September 9, 2025

Perplexity

Sonar Pro

$3.00

$15.00

Advanced search; deeper content understanding; 2x citations vs base.

September 8, 2025

xAI

grok-3

$3.00

$15.00

Standard; 10 rps. Cached Input $/M: $0.75. Live Search: $25/1K Sources.

September 8, 2025

Zhipu AI

GLM-4.5-flash

$3.20

$12.80

Strong reasoning capabilities.

September 9, 2025

OpenAI

GPT-4o (standard pricing)

$5.00

$15.00

Powerful and efficient flagship model for general-purpose tasks.

September 9, 2025

xAI

grok-3-fast

$5.00

$25.00

Fast standard; 10 rps. Cached Input $/M: $1.25. Live Search: $25/1K Sources.

September 8, 2025

OpenAI

GPT-4 Turbo

$10.00

$30.00

Previous generation model, still capable for many applications.

September 9, 2025

OpenAI

o3 (higher tier)

$10.00

$40.00

200K context, 100K max output (from search result 3).

September 9, 2025

OpenAI

o1-preview

$15.00

$60.00

128K context, 32K max output.

September 9, 2025

OpenAI

o1-2024-12-17

$15.00

$60.00

200K context, 100K max output.

September 9, 2025

OpenAI

$15.00

$60.00

200K context, 100K max output.

September 9, 2025

Anthropic

Claude 4.1 Opus

$15.00

$75.00

Latest Opus with enhanced capabilities. Prompt Caching (Write/Read $/M): $18.75 / $1.50. Batch (I/O $/M): $7.50 / $37.50.

September 9, 2025

Anthropic

Claude 4 Opus

$15.00

$75.00

Most powerful; best for highly complex research/analysis. Prompt Caching (Write/Read $/M): $18.75 / $1.50. Batch (I/O $/M): $7.50 / $37.50.

September 9, 2025

Anthropic

Claude 3 Opus

$15.00

$75.00

Previous flagship for complex tasks. Prompt Caching (Write/Read $/M): $18.75 / $1.50. Batch (I/O $/M): $7.50 / $37.50.

September 9, 2025

OpenAI

o3‑pro

$20.00

$80.00

200K context, 100K max output, professional tier.

September 9, 2025

OpenAI

GPT-4

$30.00

$60.00

8K context, 8K max output.

September 9, 2025

OpenAI

GPT-4.5

$75.00

$150.00

128K context, 16K max output, preview model.

September 9, 2025

Alibaba

Qwen3-Coder-Plus

Standard

Limited-time offer from Jul 24, 2025; 25% off with cache hit. Pricing not specified.

September 8, 2025

OpenAI

GPT-5 Series

Premium

Newer, more advanced models with premium pricing for cutting-edge tasks.

September 9, 2025

Platform Fees (OpenRouter)

OpenRouter aggregates models with these fees:

Service/Feature

Cost/Details

Comments (Changes, Notes)

Recency

Zero-Completion Insurance

$0 (automatic)

Protection for zero-token or errored responses; automatically applied to all accounts.

September 9, 2025

Sonoma Alpha Models

Free during alpha (2 M context)

Sonoma Sky/Dusk Alpha are cloaked models; free test access; logging enabled by creator.

September 9, 2025

Horizon Alpha/Beta Models

Free during beta (256 K context)

Horizon Alpha deprecated; Horizon Beta active; both were free test models; logging noted on model pages.

September 9, 2025

Credit Purchase Fee

5.5 % (min $0.80)

Fee when purchasing credits via Stripe; crypto payments fee is 5%.

September 9, 2025

BYOK Fee

5 % of provider cost

Charged against credits when routing with your own provider key(s).

September 9, 2025

Platform Fee on Inference

None (pass-through pricing)

OpenRouter passes through provider pricing without markup on inference; no additional per-inference commission charged by OpenRouter.

September 9, 2025

Acronyms and Terms

To make sense of the terminology, here's a handy reference table:

Acronym/Term

Full Name/Description

Context/Usage in AI Plans

API

Application Programming Interface

Programmatic access to models (e.g., console.anthropic.com for Claude)

Annual

Yearly Billing

Discounted pricing for annual commitments

BYOK

Bring Your Own Key

Use personal API keys on platforms like OpenRouter while paying platform fees

Cloaked Model

A pre-release or experimental model from an undisclosed creator, often for testing purposes.

Models like Sonoma Sky/Dusk Alpha on OpenRouter are released as "cloaked" to gather feedback before a potential public launch.

CoT

Chain of Thought

Step-by-step reasoning in models (e.g., GPT-5 Thinking)

Ctx

Context

Token window size for inputs (e.g., 131K for GLM-4.5)

GDPR

General Data Protection Regulation

EU privacy law compliance in plans

HIPAA

Health Insurance Portability and Accountability Act

US healthcare privacy compliance

I/O

Input/Output

Token pricing separation (e.g., input vs. output rates)

IDE

Integrated Development Environment

Code development tools (e.g., Claude Code in terminal)

INR

Indian Rupee

Currency for regional pricing (e.g., ChatGPT Go)

LLM

Large Language Model

Core AI models (e.g., Claude, GPT) trained on text data

M Tokens

Million Tokens

Usage/pricing unit (~750 words = 1,000 tokens)

MoE

Mixture of Experts

Architecture in some models (e.g., implied in advanced tiers)

Monthly

Monthly Billing

Standard month-to-month pricing

ROI

Return on Investment

Business value metric for premium plans

RPS

Requests Per Second

API rate limits (e.g., 10 rps for grok-3)

SAML

Security Assertion Markup Language

Authentication standard (e.g., ChatGPT Business)

SCIM

System for Cross-domain Identity Management

Automates user provisioning in enterprise plans

SLA

Service Level Agreement

Uptime/performance guarantees in Enterprise plans

SOC 2

Service Organization Control 2

Security compliance framework (e.g., for enterprise data)

SSO

Single Sign-On

Enterprise login for multiple services (e.g., in Claude/ChatGPT Enterprise)

Stealth Model

An experimental or pre-release model from an undisclosed provider, released anonymously for testing

OpenRouter partners with major AI companies to release "testing versions" for feedback collection

TTL

Time To Live

Cache duration (e.g., in prompt caching)

USD

United States Dollar

Primary currency for pricing

VLM

Vision Language Model

Models handling text + images (e.g., GLM-4.5v, grok-2-vision)

ZCI

Zero Completion Insurance

OpenRouter feature that waives charges for zero-token/error responses.

Wrapping Up

AI API Pricing: A 2025 Guide to Tokens, Models, and Costs

Demystifying AI API Pricing in 2025: Tokens, Models, and Costs

AI API Pricing (Non-AI Chat Plans) - FREE

AI API Pricing (Non-AI Chat Plans) - PAID

Platform Fees (OpenRouter)

Acronyms and Terms

Wrapping Up

🔍 Explore More Topics

State of LLMs

AI Coding Tools: Top Picks for 2025

AI Guide 2025: From Fundamentals to Future

AI Explained: A Step-by-Step Guide

LLMs: Top Models of 2025

AI Chat Subscriptions: The Ultimate Guide (2025)

AI API Pricing: A 2025 Guide to Tokens, Models, and Costs

Demystifying AI API Pricing in 2025: Tokens, Models, and Costs

AI API Pricing (Non-AI Chat Plans) - FREE

AI API Pricing (Non-AI Chat Plans) - PAID

Platform Fees (OpenRouter)

Acronyms and Terms

Wrapping Up