LLM Gateways Explained: Routing AI Traffic in 2026

Why multi-model AI stacks need a gateway in 2026, and how LiteLLM, Portkey, and Kong handle routing, budgets, failover, and audit logs.

Sam CarterJun 12, 2026 8 min read

Cover image for LLM Gateways Explained: Routing AI Traffic in 2026 — Photo: Ars Electronica / flickr (BY-NC-ND 2.0)

By 2026 almost nobody ships a serious AI product on a single model. A typical stack routes cheap, high-volume traffic to a self-hosted open-weight model, sends hard reasoning tasks to a frontier API, and keeps a second vendor on standby for failover. That flexibility comes with a mess: scattered API keys, no central spend tracking, prompts leaking across vendor boundaries, and no audit trail when compliance asks who called what. An LLM gateway is the piece of infrastructure that cleans this up.

Quick answer

An LLM gateway is a reverse proxy that sits between your apps and every model provider, exposing one OpenAI-compatible endpoint. It centralizes authentication, per-team budgets, rate limiting, routing, failover, and logging regardless of which model serves a request. You need one the moment you route to more than one model, enforce budgets across teams, or owe an audit trail. Start with LiteLLM (open-source, self-hosted); move to Portkey or Kong only once you know which managed features you actually use. Watch the added network hop's latency.

Key takeaways

An LLM gateway is a reverse proxy that sits between your apps and every model provider, exposing one OpenAI-compatible endpoint.
It centralizes authentication, rate limiting, spend tracking, routing, failover, and logging regardless of which model serves a request.
LiteLLM is the dominant open-source choice; Portkey and Kong AI Gateway target teams that want managed reliability and richer governance.
You need one the moment you route to more than one model, enforce per-team budgets, or owe an audit trail.
A gateway adds a network hop, so colocate it with your apps and watch the added latency.

What a gateway actually does

Every request from your application flows through a single endpoint instead of hitting OpenAI, Anthropic, or your own GPU box directly. Because the gateway speaks the OpenAI API format, any client library that already works with OpenAI works unchanged, you swap the base URL and key, nothing else.

Behind that single endpoint the gateway handles the unglamorous platform work:

Virtual keys, issue per-team or per-app keys with their own budgets and rate limits, so a runaway script cannot drain your whole quota.
Routing, send each request to a chosen model based on cost, latency, or task complexity rules.
Load balancing, spread traffic across multiple deployments of the same model.
Failover, automatically retry on a fallback provider when the primary returns errors or times out.
Observability, log every prompt, completion, token count, and dollar cost to one place.

A data center hallway with rows of servers connected by routing cables — Photo: David Illig / flickr (BY-NC-SA 2.0)

The three contenders

LiteLLM

LiteLLM is the open-source default. It started as a Python SDK that normalizes calls to 100-plus providers into the OpenAI format, then grew a proxy server, the actual gateway, that any HTTP client can hit. You self-host it, point it at your provider keys in a config file, and get virtual keys, budgets, failover, and logging out of the box. It is free, transparent, and the community is large, which matters when you hit an edge case at 2 a.m.

Portkey

Portkey targets teams that would rather not babysit gateway infrastructure. It leans into reliability features, automatic retries, request caching, fallbacks, and a polished observability dashboard, as a managed service, with a self-host option for regulated environments. If your priority is uptime and you want guardrails and analytics without building them, Portkey is the comfortable pick.

Kong AI Gateway

Kong comes at the problem from the API-management world. If your organization already runs Kong for its regular API traffic, the AI Gateway plugin extends that same governance, auth, rate limiting, policy enforcement, to model calls. It is the natural choice for enterprises that want AI traffic governed by the same control plane as everything else.

Here is how the three line up so you can match one to your situation:

Gateway	Model	Best for	Watch out for
LiteLLM	Open-source, self-hosted	Teams that want control and 100+ providers free	You run and scale it yourself
Portkey	Managed (self-host option)	Uptime, caching, polished analytics with no ops	Recurring cost; data path through a vendor unless self-hosted
Kong AI Gateway	API-management plugin	Enterprises already standardized on Kong	Heavier to adopt if you do not run Kong already

Tip

Start with LiteLLM even if you expect to outgrow it. Running an open gateway first teaches you exactly which features you actually use, so you can evaluate a managed option against real needs instead of a feature checklist.

When you actually need one

A single model behind a single key does not need a gateway, that is just overhead. The value appears when complexity does:

More than one model or provider. The moment you route between, say, an open-weight model and a frontier API, you want one place to manage both.
Multiple teams or apps. Per-team budgets and rate limits stop one group's experiment from blowing the shared bill.
Compliance. A central, immutable log of every prompt and response is far easier to produce for an auditor than logs scattered across vendor dashboards.
Reliability targets. Automatic failover across providers turns a vendor outage from an incident into a non-event.

The cost models for this kind of multi-model setup matter too; the economics in the tokenmaxxing shift explain why routing cheap traffic to smaller models pays off, and AI coding agent costs shows how prompt caching compounds those savings.

Watch the latency tax

A gateway is another network hop. If it lives in a different region from your application servers, you add round-trip time to every call, painful for latency-sensitive chat. Colocate the gateway with your apps, enable response caching for repeated prompts, and benchmark p95 latency before and after you introduce it. The platform benefits are real, but they are not free.

What to do right now

If your AI stack has outgrown a single key, set up a gateway this way:

Confirm you actually need one: more than one model or provider, multiple teams sharing a bill, an audit requirement, or a reliability target. If none apply, skip it.
Spin up LiteLLM as a proxy first, point it at your existing provider keys, and swap your app's base URL and key to the gateway. Nothing else in your code should change.
Issue virtual keys per team or app with their own budgets so one runaway script cannot drain your quota.
Configure failover to a second provider and test it by forcing the primary to error.
Colocate the gateway with your app servers and benchmark p95 latency before and after, so the convenience does not silently cost you response time.
Only move to Portkey or Kong once you know, from running LiteLLM, which managed features you actually use.

Frequently asked questions

Is an LLM gateway the same as an LLM router?

They overlap. A router specifically decides which model handles each request, often using cost or quality heuristics. A gateway is the broader platform that includes routing plus auth, budgets, logging, and failover. Most gateways contain a router; not every standalone router is a full gateway.

Does a gateway add meaningful latency?

It adds one network hop, typically single-digit milliseconds when colocated with your apps, more if it sits in another region. Caching and load balancing can offset this for repeated or high-volume traffic, but you should measure p95 latency rather than assume.

Can I self-host instead of using a SaaS gateway?

Yes. LiteLLM is built to self-host, and both Portkey and Kong offer on-prem deployments. Self-hosting keeps prompts inside your own infrastructure, which is often a hard requirement in regulated industries.

Do I still need a gateway with only one provider?

Usually not. The benefits scale with the number of models, teams, and compliance demands. With one model, one key, and one team, a gateway is overhead you can skip until your stack grows.

#ai#infrastructure