Rate limiting

Agentgateway supports two rate limiting modes:

Local rate limiting — enforced per agentgateway instance using a token-bucket algorithm. No external dependencies required.
Global rate limiting — enforced across all instances by delegating to a remote rate limit service (compatible with Envoy’s ratelimit server).

Local rate limiting

Local rate limiting is configured on a route’s policies block using the localRateLimit field. It uses a token-bucket algorithm: the bucket starts with maxTokens tokens, refills tokensPerFill tokens every fillInterval, and each request consumes one token.

# yaml-language-server: $schema=https://agentgateway.dev/schema/config
binds:
- port: 3000
  listeners:
  - routes:
    - policies:
        localRateLimit:
          - maxTokens: 10
            tokensPerFill: 1
            fillInterval: 60s
      backends:
      - mcp:
          targets:
          - name: everything
            stdio:
              cmd: npx
              args: ["@modelcontextprotocol/server-everything"]

Token-bucket fields

Field	Description
`maxTokens`	Maximum bucket capacity — the burst limit
`tokensPerFill`	Tokens added each fill interval
`fillInterval`	How often the bucket is refilled (e.g. `60s`, `1m`)

Increase maxTokens to allow bursting. Set tokensPerFill equal to maxTokens and use a longer fillInterval to implement a sliding window.

Running the local rate limit example

cargo run -- -f examples/ratelimiting/local/config.yaml

With maxTokens: 10, after 10 requests within a minute, subsequent requests return 429 Too Many Requests until the bucket refills.

Global rate limiting

Global rate limiting delegates enforcement to an external rate limit service. This ensures limits are shared across multiple agentgateway instances. Agentgateway implements the Envoy ratelimit gRPC protocol.

Infrastructure setup

The global rate limit example uses Envoy’s ratelimit server with a Redis backend:

Start Redis

docker run -d --name redis --network host redis:7.4.3

Start the ratelimit server

docker run -d --name ratelimit \
  --network host \
  -e REDIS_URL=127.0.0.1:6379 \
  -e USE_STATSD=false \
  -e LOG_LEVEL=debug \
  -e REDIS_SOCKET_TYPE=tcp \
  -e RUNTIME_ROOT=/data \
  -e RUNTIME_SUBDIRECTORY=ratelimit \
  -v $(pwd)/examples/ratelimiting/global/ratelimit-config.yaml:/data/ratelimit/config/config.yaml:ro \
  envoyproxy/ratelimit:3e085e5b \
  /bin/ratelimit -config /data/ratelimit/config/config.yaml

Start agentgateway

cargo run -- -f examples/ratelimiting/global/config.yaml

Agentgateway configuration

Configure remoteRateLimit on the route policy:

# yaml-language-server: $schema=https://agentgateway.dev/schema/config
binds:
- port: 3000
  listeners:
  - routes:
    - policies:
        remoteRateLimit:
          domain: "agentgateway"
          host: "127.0.0.1:8081"
          failureMode: failOpen
          descriptors:
            - entries:
                - key: "user"
                  value: '"test-user"'
                - key: "tool"
                  value: '"echo"'
              type: "requests"
      backends:
      - mcp:
          targets:
          - name: everything
            stdio:
              cmd: npx
              args: ["@modelcontextprotocol/server-everything"]

Remote rate limit fields

Field	Description
`domain`	The rate limit domain — must match the domain in the ratelimit server config
`host`	Address of the remote rate limit service (`host:port`)
`failureMode`	Behavior when the service is unreachable: `failOpen` (allow) or `failClosed` (deny with 500)
`descriptors`	List of descriptor sets that define what to rate limit

Failure modes

failOpen (default)
failClosed

Requests are allowed through when the rate limit service is unavailable. This prevents a rate limit outage from blocking all traffic — matching Envoy’s default behavior.

failureMode: failOpen

Requests are denied with 500 Internal Server Error when the service is unavailable. Use this when strict rate limiting is required and you prefer to reject traffic rather than allow potentially unlimited requests.

failureMode: failClosed

When failClosed is active and the service fails, the response is 500 (not 429) because the request was not rate-limited — the service was simply unreachable.

Rate limit server configuration

The Envoy ratelimit server uses its own YAML config to define limits. This file corresponds to examples/ratelimiting/global/ratelimit-config.yaml:

domain: agentgateway
descriptors:
  - key: user
    value: "test-user"
    descriptors:
      - key: tool
        value: "echo"
        rate_limit:
          unit: minute
          requests_per_unit: 5
  - key: tool
    value: "echo"
    rate_limit:
      unit: minute
      requests_per_unit: 20

This configuration defines:

Combined limit: 5 requests/minute for (user=test-user, tool=echo)
Tool limit: 20 requests/minute for any request to tool=echo

When thresholds are exceeded, the system returns OVER_LIMIT (429 Too Many Requests). To monitor enforcement:

docker logs -f ratelimit | grep -E '(OVER_LIMIT|OK)'

Combining rate limiting with authentication

Rate limiting works alongside JWT authentication and MCP authorization. The jwtAuth policy authenticates requests before rate limits are checked:

policies:
  localRateLimit:
    - maxTokens: 10
      tokensPerFill: 1
      fillInterval: 60s
  jwtAuth:
    issuer: agentgateway.dev
    audiences: [test.agentgateway.dev]
    jwks:
      file: ./manifests/jwt/pub-key
  mcpAuthorization:
    rules:
    - 'mcp.tool.name == "echo"'
    - 'jwt.sub == "test-user" && mcp.tool.name == "add"'

CEL expressions for descriptors

Descriptor values in remoteRateLimit can reference request context using CEL-like string expressions. In the example above, the value fields use quoted strings ('"test-user"' and '"echo"') that are evaluated as literal values matched against the JWT subject and MCP tool name from the request context.

Refer to the telemetry guide to visualize rate limit metrics and traces for your MCP servers.

Get Started

Core Concepts

Guides

Deployment

Local rate limiting

Token-bucket fields

Running the local rate limit example

Global rate limiting

Infrastructure setup

Agentgateway configuration

Remote rate limit fields

Failure modes

Rate limit server configuration

Combining rate limiting with authentication

CEL expressions for descriptors

​Local rate limiting

​Token-bucket fields

​Running the local rate limit example

​Global rate limiting

​Infrastructure setup

​Agentgateway configuration

​Remote rate limit fields

​Failure modes

​Rate limit server configuration

​Combining rate limiting with authentication

​CEL expressions for descriptors

Local rate limiting

Token-bucket fields

Running the local rate limit example

Global rate limiting

Infrastructure setup

Agentgateway configuration

Remote rate limit fields

Failure modes

Rate limit server configuration

Combining rate limiting with authentication

CEL expressions for descriptors