All posts
·8 min read

How to Scrape LLM API Costs into Grafana Using Prometheus

LLMeter exposes a native Prometheus endpoint at /api/v1/metrics. This guide shows you the full setup: scrape config, PromQL queries, Grafana panel JSON, and alerting rules — copy-paste ready.

Why your LLM bill is missing from your observability stack

Your Grafana dashboards show CPU, memory, latency, and error rates — but not how much you spent on GPT-4o this week. LLM API costs live in a separate provider dashboard that never gets scraped, never triggers alerts, and never appears in your incident retrospectives.

For teams that already run Prometheus + Grafana, adding a new tool just for LLM costs means another dashboard to check, another login to share, and another system to maintain. The better path: expose LLM cost data as Prometheus metrics and let your existing observability stack handle it.

LLMeter ships a native Prometheus endpoint at GET /api/v1/metrics. This post shows you how to wire it up — from scrape config to PromQL to a Grafana panel — in about 15 minutes.

What the endpoint returns

The /api/v1/metrics endpoint outputs standard Prometheus text format. Four metric families are exposed, each labeled by provider and model:

MetricTypeDescription
llmeter_cost_usdgaugeSpend in USD per provider/model
llmeter_requests_totalcounterTotal API calls per provider/model
llmeter_input_tokens_totalcounterInput tokens consumed per provider/model
llmeter_output_tokens_totalcounterOutput tokens generated per provider/model

A typical response looks like this:

# HELP llmeter_cost_usd LLM API cost in USD
# TYPE llmeter_cost_usd gauge
llmeter_cost_usd{provider="openai",model="gpt-4o"} 12.47
llmeter_cost_usd{provider="openai",model="gpt-4o-mini"} 1.83
llmeter_cost_usd{provider="anthropic",model="claude-3-5-sonnet"} 8.92

# HELP llmeter_requests_total Total number of LLM API requests
# TYPE llmeter_requests_total counter
llmeter_requests_total{provider="openai",model="gpt-4o"} 1240
llmeter_requests_total{provider="anthropic",model="claude-3-5-sonnet"} 843

# HELP llmeter_input_tokens_total Total input tokens consumed
# TYPE llmeter_input_tokens_total counter
llmeter_input_tokens_total{provider="openai",model="gpt-4o"} 3421800
llmeter_input_tokens_total{provider="anthropic",model="claude-3-5-sonnet"} 2187400

Auth uses Bearer token — the same API key you generate in your LLMeter dashboard under Settings → API Keys. Optional from and to query parameters (ISO 8601) let you scope the aggregation window.

Step 1: Generate an API key

LLMeter API keys are scoped by plan. The metrics endpoint is available on Pro and above. Generate one from Settings → API Keys → New key.

Keys are shown once. Copy it now — you will paste it into the Prometheus config in the next step.

Step 2: Add the scrape config to Prometheus

Add this job to your prometheus.yml:

scrape_configs:
  - job_name: llmeter
    scrape_interval: 5m          # LLMeter aggregates hourly; 5m avoids redundant polls
    metrics_path: /api/v1/metrics
    scheme: https
    static_configs:
      - targets:
          - www.llmeter.org
    authorization:
      type: Bearer
      credentials: <YOUR_LLMETER_API_KEY>
    params:
      from: ["now-1h"]           # optional: only last hour of data per scrape

A scrape_interval of 5 minutes is a reasonable default — LLMeter aggregates from provider billing APIs hourly, so polling faster than once per minute returns the same data. For cost-alerting purposes, 5–15 minutes is fine.

If you self-host LLMeter, replace www.llmeter.org with your own domain.

Step 3: Verify the target in Prometheus

Open your Prometheus UI at http://<prometheus-host>:9090/targets. Thellmeter job should appear with state UP after the first scrape cycle. If it shows DOWN, check:

  • The Bearer token is correct (no trailing whitespace, no quotes around the value in YAML)
  • Your Prometheus instance can reach www.llmeter.org (outbound HTTPS on port 443)
  • Your LLMeter API key is on a Pro or Team plan (free plan does not expose the metrics endpoint)

You can also test the endpoint directly with curl before configuring Prometheus:

curl -H "Authorization: Bearer <YOUR_API_KEY>" \
     "https://www.llmeter.org/api/v1/metrics"

Step 4: PromQL queries for cost visibility

Once the scrape is running, these PromQL expressions cover the most common questions:

Total spend across all providers

sum(llmeter_cost_usd)

Spend per provider (pie chart input)

sum by (provider) (llmeter_cost_usd)

Top 5 most expensive models

topk(5, sum by (model) (llmeter_cost_usd))

Output/input token ratio (cost efficiency proxy)

sum by (model) (llmeter_output_tokens_total)
  /
sum by (model) (llmeter_input_tokens_total)

A high ratio (output > 2× input) on a model like GPT-4o is usually a sign of verbose system prompts, missing max_tokens, or an agentic loop generating long completions. Output tokens cost 3–5× more than input tokens on most pricing tiers.

Alert: daily spend exceeds $50

# In prometheus/rules/llmeter.yml
groups:
  - name: llmeter
    rules:
      - alert: LLMDailyCostHigh
        expr: sum(llmeter_cost_usd) > 50
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "LLM API spend exceeded $50"
          description: "Current spend: {{ $value | humanizePercentage }} USD"

Step 5: Build a Grafana dashboard

Import the panel JSON below into Grafana (Dashboards → Import → Paste JSON). It creates a four-panel overview: total spend gauge, spend by provider (bar chart), top models (bar chart), and token throughput over time (time series).

{
  "title": "LLM API Cost — LLMeter",
  "panels": [
    {
      "type": "stat",
      "title": "Total Spend (USD)",
      "targets": [{ "expr": "sum(llmeter_cost_usd)", "legendFormat": "Total" }],
      "fieldConfig": { "defaults": { "unit": "currencyUSD" } },
      "gridPos": { "h": 4, "w": 6, "x": 0, "y": 0 }
    },
    {
      "type": "barchart",
      "title": "Spend by Provider",
      "targets": [{
        "expr": "sum by (provider) (llmeter_cost_usd)",
        "legendFormat": "{{ provider }}"
      }],
      "fieldConfig": { "defaults": { "unit": "currencyUSD" } },
      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 4 }
    },
    {
      "type": "barchart",
      "title": "Top Models by Cost",
      "targets": [{
        "expr": "topk(10, sum by (model) (llmeter_cost_usd))",
        "legendFormat": "{{ model }}"
      }],
      "fieldConfig": { "defaults": { "unit": "currencyUSD" } },
      "gridPos": { "h": 8, "w": 12, "x": 12, "y": 4 }
    },
    {
      "type": "timeseries",
      "title": "Token Throughput",
      "targets": [
        {
          "expr": "sum by (provider) (rate(llmeter_input_tokens_total[1h]))",
          "legendFormat": "input — {{ provider }}"
        },
        {
          "expr": "sum by (provider) (rate(llmeter_output_tokens_total[1h]))",
          "legendFormat": "output — {{ provider }}"
        }
      ],
      "fieldConfig": { "defaults": { "unit": "short" } },
      "gridPos": { "h": 8, "w": 24, "x": 0, "y": 12 }
    }
  ]
}

Set the Prometheus data source to whichever instance you configured the scrape job on. The panels work with Grafana 10+ — adjust gridPos values if your dashboard layout differs.

How this compares to proxy-based LLM monitoring

Most LLM cost monitoring tools (Helicone, Portkey, LangSmith) sit between your application and the provider as an HTTP proxy. They capture cost data per-request because they relay every API call.

The tradeoff: proxies add 20–80ms of latency per call, require a base_url change in your code, and route every prompt through a third-party server. For compliance-sensitive workloads or latency-critical paths, that is often a non-starter.

LLMeter reads from provider billing APIs instead — the same data your provider dashboard shows, just exposed as Prometheus metrics. No traffic relay, no prompt capture, no base URL change. The Prometheus endpoint aggregates by model and provider, not by individual request, which matches how cost alerts and dashboards are typically built anyway.

If you need per-request tracing (for debugging specific calls or attributing cost to specific users), combine LLMeter with the llmeter npm SDK:

import { wrapOpenAI } from 'llmeter';
import OpenAI from 'openai';

const client = wrapOpenAI(new OpenAI(), { apiKey: process.env.LLMETER_API_KEY });

// Every call is tracked — model, tokens, cost, optional customer ID
const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Explain Prometheus counter vs gauge.' }],
  llmeter_customer_id: 'user_abc123',  // optional: per-customer attribution
});

Per-request data flows into LLMeter's database and is included in the /api/v1/metrics aggregation — so the Grafana dashboard above reflects SDK-tracked calls alongside API-level polling data.

What this looks like in practice

A typical SRE setup after following this guide:

  • Grafana "LLM API Cost" row added to the main engineering dashboard, next to the existing infra panels
  • PagerDuty alert fired when sum(llmeter_cost_usd) crosses the daily budget threshold — same on-call rotation as service latency alerts
  • Weekly Slack digest (via Grafana alerting) showing spend by model and provider — replaces the "who checked the OpenAI invoice last week?" Slack thread
  • Output/input ratio alert flags when a specific model's completion length spikes (usually an accidental removal of max_tokens in a PR)

Getting started

The /api/v1/metrics endpoint is live for all Pro and Team plan users. Free accounts can use the LLMeter dashboard and budget alerts — the Prometheus endpoint is a Pro feature because it is typically used by teams with existing observability infrastructure.

If you already use Grafana for infra monitoring, adding LLM cost visibility is literally a copy-paste — the scrape config above is the entire integration. No new dashboarding tool, no new login, no new alert routing.

Add LLM costs to your Grafana stack

Connect your first provider in 30 seconds. Prometheus endpoint available on Pro.

Start Free — No credit card required