Reliable Structured Outputs from LLMs in 2026: Stop Parsing JSON with Regex

Native structured outputs now hit 99.9% schema compliance across the major providers. Here is how they work, and why schema-valid still isn't correct.

Sam CarterJun 22, 2026 7 min read

Cover image for Reliable Structured Outputs from LLMs in 2026: Stop Parsing JSON with Regex — Photo: Si-MOCs / flickr (BY-NC-SA 2.0)

For years, getting clean JSON out of a language model meant praying, then writing a regex to salvage whatever came back. You would ask for JSON, get a friendly "Sure! Here is your data:" wrapped around a slightly malformed object, and spend an afternoon writing parsers for failure modes. In 2026 that era is over. OpenAI, Anthropic, and Google Gemini all ship native structured outputs that enforce your schema at generation time, and the reliability numbers are no longer aspirational. The remaining trap is subtler: schema-valid is not the same as correct.

Quick answer

Stop salvaging JSON with regex: use native, schema-enforced structured outputs, which now hit roughly 99.9 percent structural compliance because constrained decoding masks any token that would break your schema. Define your output shape as a Pydantic (Python) or Zod (TypeScript) model and pass it to the provider's structured-outputs or tool-use API rather than bare JSON mode (which still fails 8 to 15 percent of the time). Anthropic routes this through tool use: your output shape is the tool's input schema. The catch is that schema-valid is not semantically correct, so add business-logic validators and sample outputs for accuracy.

Key takeaways

Native structured outputs hit roughly 99.9% schema compliance (OpenAI), with Anthropic tool use at 99.8% and Gemini at 99.7%.
The mechanism is constrained decoding: a finite state machine masks invalid tokens so the model literally cannot emit JSON that breaks your schema.
JSON mode without schema enforcement still fails 8 to 15% of the time. Use schema-enforced structured outputs, not bare JSON mode.
Anthropic implements structured output through tool use (function calling): define a tool with a JSON schema and the model "calls" it.
Structural validity does not guarantee semantic correctness. The model can still put the wrong value in the right field.

How constrained decoding works

The old approach asked the model nicely and hoped. Native structured outputs change the rules of generation itself. As the model produces tokens, a finite state machine derived from your JSON schema masks out any token that would make the output invalid. If the schema says the next thing must be a closing brace or a comma, the model is only allowed to emit those. The result is output that is structurally guaranteed to match your schema, with reported failure rates below 0.1%.

That is a categorical improvement over "JSON mode," which merely nudges the model toward JSON-shaped text without enforcing your specific schema. JSON mode still fails 8 to 15% of the time, wrong types, missing required fields, hallucinated keys, which is exactly the tax structured outputs remove.

# Pydantic-defined schema, enforced at generation time
from pydantic import BaseModel

class Invoice(BaseModel):
    vendor: str
    amount_usd: float
    due_date: str   # ISO 8601

# The provider enforces this schema during decoding; the
# returned object is guaranteed to parse into Invoice.

The provider differences that matter

All three majors support it, but the mental model differs. OpenAI exposes a dedicated structured-outputs mode you attach a JSON schema (or Pydantic/Zod model) to. Google Gemini takes a schema directly. Anthropic routes structured output through tool use: you define a tool whose input schema is your desired output shape, and the model returns its "tool call" arguments as your structured data. Functionally similar, but if you are building cross-provider, you wire Anthropic through the function-calling path rather than a separate JSON mode.

Here is how the approaches compare so you wire each one correctly:

Approach	How you pass the schema	Reported structural reliability
OpenAI structured outputs	JSON schema, Pydantic, or Zod model	~99.9%
Anthropic tool use	Tool input schema (your output shape)	~99.8%
Google Gemini	Schema passed directly	~99.7%
Bare JSON mode (any provider)	Prompt nudge only, no enforcement	Fails 8 to 15%

If you run models locally, the same idea applies. We covered the local-first version of this in Ollama structured outputs with JSON schema, where constrained decoding gives you the same schema guarantee without a hosted API.

A diagram showing a JSON schema constraining an LLM's token generation into valid output — Photo: Bob Mical / flickr (BY-NC 2.0)

The trap: valid is not correct

Here is the failure mode that bites teams who relax once the JSON parses. Structured outputs guarantee that the shape is right. They guarantee nothing about whether the content is right. The model can return a perfectly schema-valid object in which amount_usd is the invoice number instead of the total, or due_date is the issue date. It misparsed the source, but the output is structurally flawless, so your validator waves it through.

Warning

Schema validation catches structural errors, never semantic ones. A field can be the correct type and shape and still hold the wrong value. Always add business-logic validation (range checks, cross-field consistency, sanity bounds) on top of schema enforcement.

This is why structured outputs reduce, but do not remove, the need for evaluation. The structure is free now; the correctness still has to be measured. For pipelines where a wrong-but-valid field has real cost, score extractions against known-good cases the same way you would judge any model output, an LLM-as-a-judge evals harness generalizes here, and feeds naturally into the agent observability traces where these tool calls show up.

A practical setup

Define your output shape as a Pydantic (Python) or Zod (TypeScript) model, not a hand-written schema string.
Pass it to the provider's native structured-outputs or tool-use API, not bare JSON mode.
Parse the response back into your typed model so type errors surface immediately.
Add semantic validators: ranges, enums, cross-field rules, the checks a schema cannot express.
Sample real outputs and score them for correctness; structural pass rate is not accuracy.

Frequently asked questions

Is JSON mode the same as structured outputs?

No. JSON mode only encourages JSON-shaped text and still fails 8 to 15% of the time on type and field errors. Structured outputs enforce your specific schema during decoding and hit roughly 99.9% structural compliance. Always prefer schema-enforced structured outputs.

Which providers support native structured outputs?

OpenAI, Anthropic, and Google Gemini all do as of early 2026. OpenAI and Gemini take a schema directly; Anthropic implements it through tool use, where your output shape is the tool's input schema.

If output is schema-valid, is it correct?

Not necessarily. Schema enforcement guarantees the structure, not the meaning. The model can place the wrong value in a correctly typed field. You still need business-logic validation and sampling to catch semantic errors.

Should I use Pydantic or Zod?

Use whichever matches your stack, Pydantic for Python, Zod for TypeScript. Defining the schema as a typed model rather than a raw JSON schema string gives you validation and editor support for free, and both feed cleanly into the provider APIs.

The takeaway

Reliable JSON from LLMs is a solved problem in 2026, use native, schema-enforced structured outputs and you get near-perfect structural compliance. Just do not confuse that with correctness. Validate the meaning with business rules and sampling, because a wrong value in a perfectly shaped field is the one error schema enforcement will never catch.

#ai#json#tool-use