- Published on
Tracking LLM Costs in Production: Per-Model, Per-Request, Per-User Attribution
Tracking LLM Costs in Production: Per-Model, Per-Request, Per-User Attribution
TL;DR: OpenLIT auto-calculates the cost of every LLM call based on model, token count, and a configurable pricing table. Costs are exported as OpenTelemetry metrics, so you can break them down by service, environment, model, or any custom attribute — and send the data to Grafana, Datadog, or wherever you already monitor things.
The Problem: Your LLM Bill Is a Black Box
You get an invoice from OpenAI at the end of the month. It says $4,200. You have questions:
Which service spent the most?
Was it the summarization feature or the chatbot?
Did someone's runaway test loop burn $800 over the weekend?
Is GPT-4o actually worth the premium over GPT-4o-mini for this use case?
The provider dashboard gives you total tokens and total cost. It doesn't tell you which part of your application consumed what. And if you're using multiple providers (OpenAI + Anthropic + Bedrock), you're reconciling across three different billing dashboards.
You need per-request cost attribution at the application level.
How OpenLIT Tracks Costs
When you call openlit.init(), every LLM request is automatically traced with:
Model name (e.g.,
gpt-4o,claude-sonnet-4-20250514)Input tokens (prompt tokens)
Output tokens (completion tokens)
Calculated cost (based on the model's pricing)
The cost calculation happens inside the SDK using a pricing table. Here's the flow:
LLM Call → SDK intercepts → counts tokens → looks up price → emits span + metricA Minimal Example
import openlit
from openai import OpenAI
openlit.init(
otlp_endpoint="http://localhost:4318",
application_name="summarizer",
environment="production",
)
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize this quarterly report..."}],
)The trace span for this request will include attributes like:
gen_ai.usage.input_tokens: 1420
gen_ai.usage.output_tokens: 380
gen_ai.usage.cost: 0.0142
gen_ai.request.model: gpt-4o
gen_ai.system: openai
deployment.environment: production
service.name: summarizerThat gen_ai.usage.cost value is auto-calculated. You didn't have to look up pricing or do any math.
How Pricing Works
OpenLIT ships with a built-in pricing table that covers major models from OpenAI, Anthropic, Cohere, Mistral, Google, and others. The table is maintained in a pricing.json file that maps model names to per-token costs.
A simplified version looks like this:
{
"chat": {
"gpt-4o": {"input": 0.0025, "output": 0.01},
"gpt-4o-mini": {"input": 0.00015, "output": 0.0006},
"claude-sonnet-4-20250514": {"input": 0.003, "output": 0.015}
},
"embeddings": {
"text-embedding-3-small": 0.00002,
"text-embedding-ada-002": 0.0001
},
"images": {
"dall-e-3": {
"standard": {"1024x1024": 0.040}
}
}
}Prices are per 1,000 tokens (for chat/completions) or per unit (for images/embeddings).
Using Custom Pricing
If you're using a model that isn't in the default table — say a fine-tuned model or a provider with custom pricing — you can supply your own:
openlit.init(
pricing_json="/path/to/my-pricing.json",
)Or pass a URL:
openlit.init(
pricing_json="https://internal.example.com/llm-pricing.json",
)The SDK fetches and caches it at startup. Use the same JSON structure as the default table, and your custom models will get accurate cost tracking.
Breaking Down Costs by Dimension
Once cost data is flowing, you can slice it by any attribute attached to the trace or metric. The most useful breakdowns:
By Service / Application
If you set application_name per service, costs naturally break down:
# Service A
openlit.init(application_name="chatbot")
# Service B
openlit.init(application_name="summarizer")
# Service C
openlit.init(application_name="code-review-agent")Now you can answer: "The chatbot costs $2,100/month, the summarizer costs $1,400/month, and the code-review agent costs $700/month."
By Model
Every span includes the model name, so you can aggregate cost by model:
GPT-4o: $2,800/month
GPT-4o-mini: $600/month
Claude Sonnet: $800/month
This helps you decide when to downgrade. If GPT-4o-mini gives 90% of the quality for 15% of the cost on your summarization task, the numbers make the decision obvious.
By Environment
openlit.init(environment="production") # vs "staging" vs "development"If your staging environment is burning $500/month on LLM calls, you probably want to know about it. Common fix: use a cheaper model in staging or add rate limits.
By User (Custom Attributes)
To track costs per user, you need to add the user ID as a span attribute. OpenTelemetry makes this straightforward:
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
def handle_request(user_id: str, message: str):
with tracer.start_as_current_span("user-request") as span:
span.set_attribute("user.id", user_id)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": message}],
)
return response.choices[0].message.contentNow the LLM span (auto-created by OpenLIT) is a child of your user-request span. In your dashboard, you can group costs by user.id.
Building Cost Dashboards
In the OpenLIT Platform
The self-hosted OpenLIT dashboard includes built-in cost views:
Total cost over time — see daily/weekly/monthly trends
Cost by model — bar chart breaking down spend per model
Cost by application — which service is costing you the most
Individual request costs — drill into specific expensive calls
You can also build custom dashboards with the dashboard builder, adding widgets that query ClickHouse directly.
In Grafana
Since OpenLIT exports OTLP, you can build Grafana dashboards with:
Prometheus/Mimir for cost metrics (histograms, counters)
Tempo for trace details with cost attributes
Example PromQL for daily cost by model:
sum by (gen_ai_request_model) (
rate(gen_ai_usage_cost_total[24h])
)In Datadog
Send OTLP to Datadog's OTLP endpoint. Cost data shows up as custom metrics. Create monitors like:
Alert if daily cost exceeds $X
Alert if a single request costs more than $Y (indicates a runaway prompt)
Weekly cost trend report by service
Setting Up Budget Alerts
The combination of per-request cost tracking and standard metrics backends gives you alerting for free:
Alert: Daily spend exceeds budget
Set up a Grafana or Datadog alert on the cumulative daily cost metric. If sum(gen_ai.usage.cost) over the last 24 hours exceeds your threshold, fire an alert.
Alert: Anomalous request cost
Some requests cost 100x the average because of unexpectedly long prompts or completions. Track the p99 of gen_ai.usage.cost and alert if it suddenly spikes.
Alert: New model appeared
If someone deploys code that uses an expensive model you didn't approve, you'll see a new gen_ai.request.model value in your metrics. Alert on new label values.
Cost Optimization Strategies
Once you have visibility, optimization follows naturally:
Switch models where quality allows. Compare gpt-4o vs gpt-4o-mini cost with side-by-side quality (use OpenLIT's OpenGround for this). If quality is similar, switch and save 80%.
Cache repeated prompts. If you see the same prompt pattern in traces (e.g., summarization of the same document), add a cache layer. Zero LLM cost for cache hits.
Reduce context length. If your RAG pipeline stuffs 10 documents into context but the LLM only uses 2, reduce the context window. Fewer input tokens = lower cost.
Set max_tokens. If your completion only needs 100 tokens, set max_tokens=100. This prevents the model from generating unnecessarily long responses.
Batch where possible. Some providers offer lower per-token pricing for batch API calls. If latency isn't critical, batch requests.
A Complete Cost Tracking Setup
Here's a production-ready setup with cost tracking, custom attributes, and Grafana export:
import openlit
from openai import OpenAI
from opentelemetry import trace
openlit.init(
otlp_endpoint="https://grafana-otlp.example.com/otlp",
otlp_headers={"Authorization": "Bearer YOUR_GRAFANA_TOKEN"},
application_name="my-saas-api",
environment="production",
)
client = OpenAI()
tracer = trace.get_tracer(__name__)
def generate_response(user_id: str, tier: str, prompt: str) -> str:
model = "gpt-4o" if tier == "enterprise" else "gpt-4o-mini"
with tracer.start_as_current_span("generate") as span:
span.set_attribute("user.id", user_id)
span.set_attribute("user.tier", tier)
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.contentNow you can answer: "Enterprise users cost us $X/month on GPT-4o, free-tier users cost $Y/month on GPT-4o-mini."
FAQ
How do I add custom model pricing?
Create a JSON file following the same structure as the default pricing.json and pass it to openlit.init(pricing_json="/path/to/custom.json"). You can also pass a URL to load pricing from a remote server.
Does it work with fine-tuned models?
Yes. Add your fine-tuned model's name and pricing to a custom pricing JSON. The model name in the JSON must match the model name you pass to the provider's API.
What if the pricing table is outdated?
The default pricing table is updated with each SDK release. Between releases, you can override with a custom JSON pointing to a URL that you control and update as needed.
How accurate is the cost calculation?
It's based on the token count reported by the provider and the per-token price in the pricing table. For chat/completion models, accuracy is very high. For image and embedding models, it depends on the pricing model (per-image, per-token, etc.).
Can I track costs across multiple providers?
Yes. OpenLIT instruments all providers uniformly. If a request goes to OpenAI and another to Anthropic, both get cost attributes. Aggregate them in your dashboard for a unified view.
- Name