LLM context
Thellm context object is available in CEL expressions when Agentgateway is proxying requests to an AI backend (backend.type == "ai"). It exposes information about the LLM request and response, including the model used, token counts, and the raw prompt and completion.
The
llm object is only present when using an ai backend. Token count fields such as llm.inputTokens and llm.outputTokens are only populated after the response is received from the LLM provider.Core fields
Whether the LLM response is being streamed.
The model name requested by the client. This may differ from the model that actually served the response (see
llm.responseModel).The model that actually served the LLM response. May differ from
llm.requestModel if the provider mapped the requested model to a different version.The name of the LLM provider handling the request.
Token counts
Token count fields are populated from the LLM provider’s response and are available after the response is received (for example, in logging and post-response authorization policies).The number of tokens in the input prompt as reported by the provider.
The number of input tokens served from the provider’s prompt cache. These represent cost savings.
The number of tokens written to the provider’s prompt cache. These represent additional cost for cache creation.
llm.cacheCreationInputTokens is not present when using OpenAI. It is specific to providers that support explicit cache creation (such as Anthropic).The number of tokens in the LLM response.
The number of reasoning tokens in the output. Only populated for models that support extended reasoning (such as
o1 or Claude with thinking enabled).The total number of tokens for the request (input + output).
The number of tokens counted when using the token counting endpoint. These are not counted as input tokens since the token counting endpoint does not consume tokens.
Prompt and completion
The prompt sent to the LLM as an array of chat messages. Each message has
role and content fields.The completion returned by the LLM as an array of strings.
Parameters
Thellm.params object contains the inference parameters from the LLM request.
The parameters for the LLM request.
llmRequest
The raw LLM request before any LLM policy processing. This is only available during LLM policy evaluation. Policies that run after the LLM policy — such as logging policies — will not have this field even for LLM requests.Use
llmRequest when you need access to the unmodified request before transformations are applied.Examples
Rate limiting by total token count
Rate limiting by total token count
Use token counts as the rate limiting key to enforce per-user token budgets. In a rate limiting policy, you can use CEL to select the dimension to limit on:Combined with threshold checks in authorization policies:
Log the model used and token breakdown
Log the model used and token breakdown
In a logging policy, define named fields using CEL expressions:
Block high-temperature requests
Block high-temperature requests
Reject requests with a temperature above a threshold in an authorization policy:
Enforce max token limits
Enforce max token limits
Reject requests that request more tokens than your policy allows:
Restrict access to specific models
Restrict access to specific models
Only allow requests for approved models:
Detect prompt content for policy enforcement
Detect prompt content for policy enforcement
Check prompt messages for sensitive patterns. Use with prompt guard policies or custom authorization:
Tracing: sample expensive requests
Tracing: sample expensive requests
Sample only requests that exceed a token count threshold for detailed tracing:
Provider-specific logic
Provider-specific logic
Apply different policies based on the provider: