Prompt Caching Compared: OpenAI, Claude, Gemini
All three big providers now discount cached input 90%, but the write fees, minimums, and lifetimes differ. Here is which one wins your workload.

Prompt caching is the cheapest 90% you will ever save on an LLM bill, and by mid-2026 all three major providers offer it. The headline discount is identical, which means the real decision hides in the fine print: write fees, minimum sizes, cache lifetimes, and whether you have to manage it yourself.
Quick answer
As of mid-2026, OpenAI, Anthropic, and Google all discount cached input tokens by 90%. The differences are in the mechanics. OpenAI caches automatically with no separate write fee. Anthropic uses explicit cache_control breakpoints and charges a write surcharge (1.25x input for a 5-minute cache, 2x for one hour). Gemini offers both implicit automatic caching and explicit caching with a per-hour storage fee. For high-reuse workloads the write fee washes out and they tie; for low-reuse, OpenAI's no-write-fee model wins slightly.
Key takeaways
- All three discount cached reads by 90%. The read economics are essentially tied.
- The difference is write cost and control. OpenAI is automatic and free to write; Anthropic charges to write; Gemini charges storage per hour.
- High reuse favors everyone equally because the write fee amortizes away.
- Low reuse slightly favors OpenAI thanks to no write surcharge.
- Structure your prompt for caching with the static content first, or you get nothing.
Why prompt caching pays off
Most production prompts repeat a large static block: a system prompt, tool definitions, few-shot examples, a retrieved document. Without caching, you pay full input price to reprocess that block on every call. Caching stores the processed prefix so subsequent calls read it back at a fraction of the cost.
The savings scale with how much of your prompt is reused. An agent that sends the same 8,000-token system prompt on every turn can cut its input bill dramatically, which matters because agentic workflows make many calls per task and input tokens dominate.
The provider mechanics
OpenAI: automatic, no write fee
OpenAI caches automatically. There is no cache_control to set and no separate write line item. The first call is billed at the standard input rate, and subsequent calls that match the same prefix are billed at the cached rate, a 90% discount. Simplicity is the selling point: you get caching without changing your code, as long as your static content sits at the front of the prompt.
Anthropic: explicit, with a write surcharge
Anthropic uses explicit caching. You mark cache breakpoints with cache_control. Writing the cache costs more than a normal input token: 1.25x input for the 5-minute time-to-live, or 2x for the 1-hour TTL. Reads then cost 0.10x input, the same 90% discount. The explicit model gives you control over exactly what is cached and for how long, at the price of a write surcharge and a bit of code.
Google Gemini: implicit and explicit
Gemini supports both. Implicit caching works automatically, similar to OpenAI. Explicit caching lets you cache content deliberately but adds a per-hour storage fee for keeping it alive. Cached tokens are billed at 10% of the input rate, again a 90% discount.

Head to head
| Factor | OpenAI | Anthropic | Google Gemini |
|---|---|---|---|
| Cached read discount | 90% | 90% | 90% |
| Write fee | None | 1.25x (5 min) / 2x (1 hr) | Storage fee per hour |
| Control | Automatic | Explicit breakpoints | Implicit or explicit |
| Cache lifetime | Provider-managed | 5 min or 1 hr TTL | Configurable with storage cost |
| Best when | Low reuse, want zero setup | High reuse, want control | Mixed, want a choice |
The breakeven that actually matters
The write fee only stings if you write more than you read. For high cache-hit workloads (roughly five or more reads per write cycle), Anthropic's write surcharge amortizes away and the providers tie on read economics. For low cache-hit workloads (under about three reads per write), OpenAI's no-write-fee structure wins by a few percent because you are not paying a surcharge you rarely recover.
| Workload | Winner | Why |
|---|---|---|
| Chat with long shared system prompt | Tie | High reuse, write fee amortizes |
| One-off long-document Q&A | OpenAI | Few reads, no write fee to recover |
| Agent with stable tool defs | Tie | Reused every turn |
| Need control over what caches | Anthropic | Explicit breakpoints |
What to do right now
- Put static content first. System prompt, tool definitions, and examples belong at the front so the reusable prefix is as long as possible.
- Measure your reuse ratio. Count reads per write cycle; that number decides whether write fees matter to you.
- Do not fragment the prefix. A single changing token near the top invalidates the whole cache. See our cache-miss fixes in the coding cost guide.
- Stack caching with other levers. Combine it with semantic caching at the app layer and KV cache optimization if you self-host.
- Verify against real usage. Log cache hit rates in production, since real traffic rarely matches the estimate.
- Confirm current pricing before committing; only trust the provider's own pricing page for exact numbers.
Frequently asked questions
Do all three really give the same 90% discount?
On cached reads, yes, as of mid-2026. The read discount is 90% across OpenAI, Anthropic, and Gemini. The differences are in write cost, control, and cache lifetime, not the headline read rate.
Why does Anthropic charge to write the cache?
Because it caches explicitly and holds the prefix for a defined TTL. The write surcharge (1.25x or 2x input) covers that. If you read the cache several times before it expires, the surcharge is easily recovered.
What breaks a cache hit?
Any change to the cached prefix. If a token near the top of the prompt varies per request, the reusable prefix ends there and you cache far less. Keep everything dynamic at the bottom of the prompt.
Is caching worth it for one-off requests?
No. Caching pays off through reuse. A single unique request with no repeat gets no benefit, and on Anthropic you would even pay a write fee for nothing. It shines on repeated prefixes.


