Demystifying AI API Pricing in 2025: Tokens, Models, and Costs
Demystifying AI API Pricing in 2025: Tokens, Models, and Costs
Posted on September 9, 2025 by AI Insights Blog
API access to AI models powers everything from apps to enterprise systems, but pricing can be complex with token-based billing. In this post, we cover the latest AI API pricing (non-chat plans), sorted by input cost per million tokens from low to high. For ties, the most recent models come first.
Data is up-to-date as of September 9, 2025. We've included recent stealth models from OpenRouter (like Sonoma and Horizon series) in the free tier, as they're in alpha/beta testing. For subscription-based chat plans, see our other post on AI Chat Subscription Plans in 2025.
We've added notes on OpenAI's Realtime API updates from August 28, 2025, which introduce advanced speech-to-speech capabilities, MCP server support, image input, and SIP phone calling—potentially impacting API usage in voice agents (reflected in relevant comments where applicable).
AI API Pricing (Non-AI Chat Plans) - FREE
Free APIs are great for testing, often with logging for feedback.
Provider | Model | Input $/M Tokens | Output $/M Tokens | Comments (Pricing Details, Context, Notes) | Recency |
---|---|---|---|---|---|
xAI | grok code 1 | $0.00 | $0.00 | Free via launch partners through noon PT on Sep 10, 2025; then $0.20/M input and $1.50/M output apply. | September 9, 2025 |
OpenRouter | Sonoma Sky Alpha | $0.00 | $0.00 | Cloaked “stealth” frontier model; 2,000,000-token context, supports image inputs and parallel tool calling; free during testing; prompts and completions logged for feedback. vision + tools; free test with logging. | September 9, 2025 |
OpenRouter | Sonoma Dusk Alpha | $0.00 | $0.00 | Cloaked “stealth” frontier model optimized for speed; 2,000,000-token context, supports image inputs and parallel tool calling; free during testing; prompts and completions logged for feedback. (faster); vision + tools; free test with logging. | September 9, 2025 |
OpenRouter | Horizon Beta | $0.00 | $0.00 | 256K-token context cloaked model; improved successor to Horizon Alpha; free during testing; prompts/completions logged. | September 9, 2025 |
OpenRouter | Horizon Alpha | $0.00 | $0.00 | 256K-token context early cloaked model; deprecated in favor of Horizon Beta. | September 9, 2025 |
OpenRouter | Cypher Alpha | $0.00 | $0.00 | Cloaked all-purpose model; 1,000,000-token context; free testing; logging enabled by provider. | September 9, 2025 |
AI API Pricing (Non-AI Chat Plans) - PAID
Paid APIs scale with usage, often with caching discounts. Note: OpenAI's Realtime API (August 2025 update) adds speech-to-speech and voice agent features, billed at model rates—check for integration in models like GPT-4o.
Provider | Model | Input $/M Tokens | Output $/M Tokens | Comments (Pricing Details, Context, Notes) | Recency |
---|---|---|---|---|---|
Zhipu AI | GLM-4.5-airx | $0.02 | $0.06 | Ultra-fast, lowest cost model. | September 9, 2025 |
OpenAI | GPT-5 nano | $0.05 | $0.40 | Entry-level GPT-5 model; 400K input/128K output context. | September 9, 2025 |
Meta | Llama 3.2 11b Vision (via Deepinfra) | $0.055 | $0.055 | 128K context, 2K max output, vision capabilities. | September 9, 2025 |
Alibaba | Qwen-Turbo | $0.0525 | $0.21 | 1M context; extremely low-cost; per 1K: $0.00005/$0.0002. | September 8, 2025 |
DeepSeek | deepseek-reasoner | $0.07 | $1.68 | Input pricing is for cache hits; cache miss is $0.56/M. | September 8, 2025 |
DeepSeek | deepseek-chat | $0.07 | $1.68 | Input pricing is for cache hits; cache miss is $0.56/M. Off-peak discounts ended Sep 5, 2025. | September 8, 2025 |
OpenAI | GPT-4.1 nano | $0.10 | $0.40 | Lightweight GPT-4.1 variant; 1M input/32K output context. | September 9, 2025 |
2.5 Flash-Lite | $0.10 - $0.30 | $0.40 | Economical. Caching (Input $/M): $0.025-$0.125. | September 8, 2025 | |
2.0 Flash | $0.10 - $0.70 | $0.40 | Text/img/video/audio. Live API: $0.35-$2.10 in / $1.50-$8.50 out. Caching (Input $/M): $0.025-$0.175. Batch (I/O $/M): $0.05 / $0.20. | September 8, 2025 | |
OpenAI | GPT-4o Mini | $0.15 | $0.60 | Cost-effective and fast option for high-volume tasks. | September 9, 2025 |
Zhipu AI | GLM-4.5-air | $0.16 | $1.07 | Cost-effective lightweight model. | September 9, 2025 |
Meta | Llama 3.3 70b (via Deepinfra) | $0.23 | $0.40 | 128K context, 2K max output. | September 9, 2025 |
OpenAI | GPT-5 mini | $0.25 | $2.00 | Mid-tier GPT-5 model; 400K input/128K output context. | September 9, 2025 |
Anthropic | Claude 3 Haiku | $0.25 | $1.25 | Most economical; fastest and most compact. Prompt Caching (Write/Read $/M): $0.30 / $0.03. Batch (I/O $/M): $0.125 / $0.625. | September 9, 2025 |
DeepSeek | DeepSeek-V3 | $0.27 | $1.10 | 64K context, 8K max output. | September 9, 2025 |
Alibaba | QVQ-72B-Preview | $0.2756 | $0.5513 | Preview version; up to 97% price cuts reported. | September 8, 2025 |
xAI | grok-3-mini | $0.30 | $0.50 | Budget model; 8 rps. Cached Input $/M: $0.075. Live Search: $25/1K Sources. | September 8, 2025 |
2.5 Flash | $0.30 - $1.00 | $2.50 | Text/img/video/audio. Live API: $0.50-$3.00 in / $2.00-$12.00 out. Caching (Input $/M): $0.075-$0.25. | September 8, 2025 | |
Alibaba | Qwen2.5 72B/7B | $0.30 - $0.35 | $0.40 | Large/small options; ~$0.0004 per 1K for Plus equivalents. | September 8, 2025 |
Zhipu AI | GLM-4.5 | $0.33 | $1.32 | 131K context. Zhipu is actively promoting migration for former Claude users, offering 20 million free tokens. | September 9, 2025 |
Meta | Llama 3.2 90b Vision (via Deepinfra) | $0.35 | $0.40 | 128K context, 2K max output, vision capabilities. | September 9, 2025 |
OpenAI | GPT-4.1 mini | $0.40 | $1.60 | Smaller GPT-4.1 variant; 1M input/32K output context. | September 9, 2025 |
Alibaba | Qwen-VL-Max | $0.41 | Similar to Input | Vision model. Price was listed as "$0.41 per 1K" which is assumed to be a typo for "$0.41 per 1M tokens". | September 8, 2025 |
Alibaba | Qwen-Plus | $0.42 | $1.26 | 131K context; tiered increases >128K; non-thinking ≤128K: $0.115/$0.287. | September 8, 2025 |
OpenAI | GPT-3.5 Turbo | $0.50 | $1.50 | 16K context, 4K max output. | September 9, 2025 |
Zhipu AI | GLM-4.5v | $0.50 | $1.80 | Vision model with 65.5K context. | September 9, 2025 |
DeepSeek | DeepSeek-R1 | $0.55 | $2.19 | 64K context, 8K max output. | September 9, 2025 |
Meta | Llama 3 70b (via Deepinfra/Groq) | $0.59 | $0.79 | 8K context, 2K max output. | September 9, 2025 |
xAI | grok-3-mini-fast | $0.60 | $4.00 | Faster budget; 3 rps. Cached Input $/M: $0.15. Live Search: $25/1K Sources. | September 8, 2025 |
Alibaba | Qwen3 235B Thinking | $0.735 | $8.82 | Thinking mode; flagship 235B; $0.70/$2.80 non-thinking. | September 8, 2025 |
Anthropic | Claude 3.5 Haiku | $0.80 | $4.00 | Economical; near-instantaneous responses. Prompt Caching (Write/Read $/M): $1.00 / $0.08. Batch (I/O $/M): $0.40 / $2.00. | September 9, 2025 |
Alibaba | Qwen3-Max-Preview (New) | $0.861+ | $3.441+ | Tiered pricing based on context (0-32K, 32K-128K, 128K-252K). New 1T+ parameter model; 262K context. | September 6, 2025 |
Perplexity | Sonar | $1.00 | $1.00 | Lightweight, cost-effective; quick facts, news. | September 8, 2025 |
Perplexity | Sonar Reasoning | $1.00 | $5.00 | Quick problem-solving with step-by-step logic. | September 8, 2025 |
OpenAI | o1-mini | $1.10 | $4.40 | 128K context, 65K max output, reasoning-optimized. | September 9, 2025 |
OpenAI | o3-mini | $1.10 | $4.40 | 200K context, 100K max output. | September 9, 2025 |
OpenAI | o4-mini | $1.10 | $4.40 | 200K context, 100K max output. | September 9, 2025 |
OpenAI | GPT-5 | $1.25 | $10.00 | Latest GPT-5 model; 400K input/128K output context. | September 9, 2025 |
2.5 Pro (≤200K) | $1.25 | $10.00 | Text/img/video; standard context. Caching (Input $/M): $0.31. Batch (I/O $/M): $0.625 / $5.00. | September 8, 2025 | |
1.5 Pro | $1.25 / $2.50 | $5.00 / $10.00 | Previous generation. Pricing is for ≤128K / >128K context. Caching (Input $/M): $0.3125 / $0.625. | September 8, 2025 | |
Alibaba | Qwen2.5-Max | $1.60 | $6.40 | 32K context, 8K max output. | September 9, 2025 |
Alibaba | Qwen-Max | $1.68 | $6.72 | 32K context; Qwen-Max-2025-01-25 version; free 1M tokens for new users (180 days validity). | September 8, 2025 |
Meta | Llama 3.1 405b (via Deepinfra) | $1.79 | $1.79 | 128K context, 2K max output. | September 9, 2025 |
OpenAI | GPT-4.1 | $2.00 | $8.00 | 1M input/32K output context. | September 9, 2025 |
OpenAI | o3 | $2.00 | $8.00 | 200K context, 100K max output (alternate pricing from search result 4). | September 9, 2025 |
xAI | grok-2 | $2.00 | $10.00 | General; 15 rps; reasoning tokens as completion. | September 8, 2025 |
xAI | grok-2-vision | $2.00 | $10.00 | Vision capabilities; 10 rps. | September 8, 2025 |
Perplexity | Sonar Reasoning Pro | $2.00 | $8.00 | Enhanced multi-step reasoning with web search. | September 8, 2025 |
Perplexity | Sonar Deep Research | $2.00 | $8.00 | Exhaustive research; detailed reports. Citation: $2.00/M, Search: $5.00/1K, Reasoning: $3.00/M. | September 8, 2025 |
OpenAI | GPT-4o | $2.50 | $10.00 | 128K context, 16K max output (alternate pricing from search result 3). Supports Realtime API features like speech-to-speech and image input (Aug 2025 update). | September 9, 2025 |
2.5 Pro (>200K) | $2.50 | $15.00 | Large context. Caching (Input $/M): $0.625. Batch (I/O $/M): $1.25 / $7.50. | September 8, 2025 | |
Anthropic | Claude 3.7 Sonnet | $3.00 | $15.00 | Blends fast responses with deeper reasoning. Prompt Caching (Write/Read $/M): $3.75 / $0.30. Batch (I/O $/M): $1.50 / $7.50. | September 9, 2025 |
Anthropic | Claude 3.5 Sonnet | $3.00 | $15.00 | 25% price reduction in 2025; balanced speed/intelligence. Prompt Caching (Write/Read $/M): $3.75 / $0.30. Batch (I/O $/M): $1.50 / $7.50. | September 9, 2025 |
xAI | Grok-3-Preview | $3.00 | $15.00 | 131K context. | September 9, 2025 |
xAI | Grok 4 | $3.00 | $15.00 | 1M input/256K output context. | September 9, 2025 |
Cohere | Command-R+ | $3.00 | $15.00 | Enterprise-grade model. | September 9, 2025 |
Anthropic | Claude 4 Sonnet | $3.00 | $15.00 | Enhanced coding/reasoning; 1M token context for tiered orgs. Prompt Caching (Write/Read $/M): $3.75 / $0.30. Batch (I/O $/M): $1.50 / $7.50. | September 9, 2025 |
Anthropic | Claude 3 Sonnet | $3.00 | $15.00 | Previous generation balanced model. Prompt Caching (Write/Read $/M): $3.75 / $0.30. Batch (I/O $/M): $1.50 / $7.50. | September 9, 2025 |
Perplexity | Sonar Pro | $3.00 | $15.00 | Advanced search; deeper content understanding; 2x citations vs base. | September 8, 2025 |
xAI | grok-3 | $3.00 | $15.00 | Standard; 10 rps. Cached Input $/M: $0.75. Live Search: $25/1K Sources. | September 8, 2025 |
Zhipu AI | GLM-4.5-flash | $3.20 | $12.80 | Strong reasoning capabilities. | September 9, 2025 |
OpenAI | GPT-4o (standard pricing) | $5.00 | $15.00 | Powerful and efficient flagship model for general-purpose tasks. | September 9, 2025 |
xAI | grok-3-fast | $5.00 | $25.00 | Fast standard; 10 rps. Cached Input $/M: $1.25. Live Search: $25/1K Sources. | September 8, 2025 |
OpenAI | GPT-4 Turbo | $10.00 | $30.00 | Previous generation model, still capable for many applications. | September 9, 2025 |
OpenAI | o3 (higher tier) | $10.00 | $40.00 | 200K context, 100K max output (from search result 3). | September 9, 2025 |
OpenAI | o1-preview | $15.00 | $60.00 | 128K context, 32K max output. | September 9, 2025 |
OpenAI | o1-2024-12-17 | $15.00 | $60.00 | 200K context, 100K max output. | September 9, 2025 |
OpenAI | o1 | $15.00 | $60.00 | 200K context, 100K max output. | September 9, 2025 |
Anthropic | Claude 4.1 Opus | $15.00 | $75.00 | Latest Opus with enhanced capabilities. Prompt Caching (Write/Read $/M): $18.75 / $1.50. Batch (I/O $/M): $7.50 / $37.50. | September 9, 2025 |
Anthropic | Claude 4 Opus | $15.00 | $75.00 | Most powerful; best for highly complex research/analysis. Prompt Caching (Write/Read $/M): $18.75 / $1.50. Batch (I/O $/M): $7.50 / $37.50. | September 9, 2025 |
Anthropic | Claude 3 Opus | $15.00 | $75.00 | Previous flagship for complex tasks. Prompt Caching (Write/Read $/M): $18.75 / $1.50. Batch (I/O $/M): $7.50 / $37.50. | September 9, 2025 |
OpenAI | o3‑pro | $20.00 | $80.00 | 200K context, 100K max output, professional tier. | September 9, 2025 |
OpenAI | GPT-4 | $30.00 | $60.00 | 8K context, 8K max output. | September 9, 2025 |
OpenAI | GPT-4.5 | $75.00 | $150.00 | 128K context, 16K max output, preview model. | September 9, 2025 |
Alibaba | Qwen3-Coder-Plus | Standard | Standard | Limited-time offer from Jul 24, 2025; 25% off with cache hit. Pricing not specified. | September 8, 2025 |
OpenAI | GPT-5 Series | Premium | Premium | Newer, more advanced models with premium pricing for cutting-edge tasks. | September 9, 2025 |
Platform Fees (OpenRouter)
OpenRouter aggregates models with these fees:
Service/Feature | Cost/Details | Comments (Changes, Notes) | Recency |
---|---|---|---|
Zero-Completion Insurance | $0 (automatic) | Protection for zero-token or errored responses; automatically applied to all accounts. | September 9, 2025 |
Sonoma Alpha Models | Free during alpha (2 M context) | Sonoma Sky/Dusk Alpha are cloaked models; free test access; logging enabled by creator. | September 9, 2025 |
Horizon Alpha/Beta Models | Free during beta (256 K context) | Horizon Alpha deprecated; Horizon Beta active; both were free test models; logging noted on model pages. | September 9, 2025 |
Credit Purchase Fee | 5.5 % (min $0.80) | Fee when purchasing credits via Stripe; crypto payments fee is 5%. | September 9, 2025 |
BYOK Fee | 5 % of provider cost | Charged against credits when routing with your own provider key(s). | September 9, 2025 |
Platform Fee on Inference | None (pass-through pricing) | OpenRouter passes through provider pricing without markup on inference; no additional per-inference commission charged by OpenRouter. | September 9, 2025 |
Acronyms and Terms
To make sense of the terminology, here's a handy reference table:
Acronym/Term | Full Name/Description | Context/Usage in AI Plans |
---|---|---|
API | Application Programming Interface | Programmatic access to models (e.g., console.anthropic.com for Claude) |
Annual | Yearly Billing | Discounted pricing for annual commitments |
BYOK | Bring Your Own Key | Use personal API keys on platforms like OpenRouter while paying platform fees |
Cloaked Model | A pre-release or experimental model from an undisclosed creator, often for testing purposes. | Models like Sonoma Sky/Dusk Alpha on OpenRouter are released as "cloaked" to gather feedback before a potential public launch. |
CoT | Chain of Thought | Step-by-step reasoning in models (e.g., GPT-5 Thinking) |
Ctx | Context | Token window size for inputs (e.g., 131K for GLM-4.5) |
GDPR | General Data Protection Regulation | EU privacy law compliance in plans |
HIPAA | Health Insurance Portability and Accountability Act | US healthcare privacy compliance |
I/O | Input/Output | Token pricing separation (e.g., input vs. output rates) |
IDE | Integrated Development Environment | Code development tools (e.g., Claude Code in terminal) |
INR | Indian Rupee | Currency for regional pricing (e.g., ChatGPT Go) |
LLM | Large Language Model | Core AI models (e.g., Claude, GPT) trained on text data |
M Tokens | Million Tokens | Usage/pricing unit (~750 words = 1,000 tokens) |
MoE | Mixture of Experts | Architecture in some models (e.g., implied in advanced tiers) |
Monthly | Monthly Billing | Standard month-to-month pricing |
ROI | Return on Investment | Business value metric for premium plans |
RPS | Requests Per Second | API rate limits (e.g., 10 rps for grok-3) |
SAML | Security Assertion Markup Language | Authentication standard (e.g., ChatGPT Business) |
SCIM | System for Cross-domain Identity Management | Automates user provisioning in enterprise plans |
SLA | Service Level Agreement | Uptime/performance guarantees in Enterprise plans |
SOC 2 | Service Organization Control 2 | Security compliance framework (e.g., for enterprise data) |
SSO | Single Sign-On | Enterprise login for multiple services (e.g., in Claude/ChatGPT Enterprise) |
Stealth Model | An experimental or pre-release model from an undisclosed provider, released anonymously for testing | OpenRouter partners with major AI companies to release "testing versions" for feedback collection |
TTL | Time To Live | Cache duration (e.g., in prompt caching) |
USD | United States Dollar | Primary currency for pricing |
VLM | Vision Language Model | Models handling text + images (e.g., GLM-4.5v, grok-2-vision) |
ZCI | Zero Completion Insurance | OpenRouter feature that waives charges for zero-token/error responses. |
Wrapping Up
API pricing is token-driven, so optimize for caching to save costs. With updates like OpenAI's Realtime API, voice and multimodal features are becoming more accessible. Stay tuned for more innovations!