April 27, 2026·8 min read

How to Track Anthropic API Costs: Claude Spend by Model and Customer

Anthropic's console shows total Claude spend but not which model, feature, or customer drove it. Here's how to get per-model and per-customer cost breakdowns using the Anthropic Usage API and the llmeter SDK.

Why the Anthropic console is not enough

Anthropic’s console at console.anthropic.com shows your current-month spend and a per-day cost graph. It is enough to confirm a billing spike happened — but it does not answer the questions that engineering and product teams need:

Is claude-3-5-sonnet or claude-3-haiku driving the bill?
Which feature — summarization, chat, document analysis — is most expensive?
Is one customer responsible for 40% of your Claude spend?
Did switching from Sonnet to Haiku for batch jobs actually save money?

Answering these requires going beyond the console. Anthropic provides a Usage API and a token-based billing model; combining both with per-call instrumentation gives you the full picture.

Anthropic’s model pricing: why the input/output split matters

Claude models use asymmetric token pricing — output tokens cost significantly more than input tokens. Current pricing for the main tiers:

Model	Input ($/1M)	Output ($/1M)
`claude-opus-4`	$15.00	$75.00
`claude-sonnet-4`	$3.00	$15.00
`claude-3-5-sonnet`	$3.00	$15.00
`claude-3-5-haiku`	$0.80	$4.00
`claude-3-haiku`	$0.25	$1.25

The 5:1 output/input ratio is the first thing most teams miss. A chat feature with long responses can have an effective cost rate 3–4× higher than what the “input price” implies. Tracking both token types separately — not just total tokens — is essential for accurate cost attribution.

Approach 1: Anthropic Usage API (no code changes)

Anthropic provides a Usage API at https://api.anthropic.com/v1/usage that returns aggregated token counts by model and date. It requires your Anthropic API key with standard read scope.

A minimal polling script that pulls the last 7 days of spend:

const ANTHROPIC_KEY = process.env.ANTHROPIC_API_KEY;
const PRICING: Record<string, { input: number; output: number }> = {
  'claude-opus-4-5':       { input: 15.00, output: 75.00 },
  'claude-sonnet-4-6':     { input:  3.00, output: 15.00 },
  'claude-3-5-sonnet-20241022': { input: 3.00, output: 15.00 },
  'claude-3-5-haiku-20241022':  { input: 0.80, output:  4.00 },
  'claude-3-haiku-20240307':    { input: 0.25, output:  1.25 },
};

async function getDailySpend(startDate: string, endDate: string) {
  const res = await fetch(
    `https://api.anthropic.com/v1/usage?start_time=${startDate}&end_time=${endDate}`,
    { headers: { 'x-api-key': ANTHROPIC_KEY!, 'anthropic-version': '2023-06-01' } }
  );
  const data = await res.json();

  return data.data.map((bucket: {
    model: string;
    input_tokens: number;
    output_tokens: number;
    cache_read_input_tokens: number;
  }) => {
    const p = PRICING[bucket.model] ?? { input: 0, output: 0 };
    const cost =
      (bucket.input_tokens / 1_000_000) * p.input +
      (bucket.output_tokens / 1_000_000) * p.output;
    return { model: bucket.model, cost_usd: cost.toFixed(6) };
  });
}

Note the cache_read_input_tokens field — Anthropic charges a discounted rate for prompt cache hits (currently 10% of normal input price). A full cost calculation should account for cache reads separately to avoid overestimating spend on cache-heavy workloads.

The limitation is the same as OpenAI’s Usage API: data is aggregated at the organization level. You cannot see which of your services or customers generated a specific request.

Approach 2: SDK wrapper for per-customer attribution

For per-customer or per-feature cost breakdown, you need to capture token counts at the call site. The Anthropic SDK returnsusage.input_tokens and usage.output_tokens on every response — wrapping the client gives you a natural interception point.

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();
const PRICING: Record<string, { input: number; output: number }> = {
  'claude-3-5-sonnet-20241022': { input: 3.00, output: 15.00 },
  'claude-3-5-haiku-20241022':  { input: 0.80, output:  4.00 },
};

async function trackedMessage(
  params: Anthropic.MessageCreateParams,
  customerId: string,
) {
  const res = await anthropic.messages.create(params);
  const { input_tokens, output_tokens } = res.usage;
  const p = PRICING[params.model];
  if (p) {
    const cost =
      (input_tokens / 1_000_000) * p.input +
      (output_tokens / 1_000_000) * p.output;
    await recordCost({ customerId, model: params.model, cost });
  }
  return res;
}

The downside: you maintain a pricing table that drifts as Anthropic releases new model versions, and you add instrumentation overhead at every call site.

Approach 3: LLMeter for unified Anthropic cost monitoring

LLMeter combines both approaches — it polls the Anthropic Usage API automatically for authoritative billing data, and provides a zero-dependency SDK wrapper with a maintained pricing catalog:

import Anthropic from '@anthropic-ai/sdk';
import { LLMeter, wrapAnthropic } from '@simplifai-solutions/llmeter';

const meter = new LLMeter({ apiKey: process.env.LLMETER_KEY! });
const anthropic = wrapAnthropic(new Anthropic(), meter);

// All calls tracked automatically.
// Pass llmeter_customer_id to attribute spend to a specific user.
const res = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  messages: [{ role: 'user', content: prompt }],
  llmeter_customer_id: req.user.id,
});

The wrapper calls api.anthropic.com directly — no proxy, no latency overhead, no prompt exposure to a third party. Token counts are recorded after the response returns and sent to LLMeter in background batches.

Prompt caching: the hidden cost reducer

Anthropic’s prompt caching feature lets you mark a large static block (system prompt, document context, tool definitions) as cacheable. Subsequent requests that reuse the same block pay only 10% of the input price for the cached portion — and cache writes cost 25% more but are amortized across all subsequent reads.

For workloads where you send a large document or system prompt on every call, caching can reduce input token costs by 70–90%. Tracking cache hit rates alongside cost gives you a clear signal of whether your caching strategy is working — or whether a code change inadvertently broke cache reuse.

LLMeter records cache_read_input_tokens and cache_creation_input_tokens from the Anthropic response alongside regular token counts, so your cost dashboard reflects the discounted rate automatically.

What per-model tracking reveals

Once you have model-level and customer-level attribution, a few patterns surface quickly:

Sonnet where Haiku would do. Claude 3.5 Sonnet costs 10× more than Claude 3 Haiku per token. Classification, extraction, and short-answer tasks often run equally well on Haiku. Teams that set up per-feature cost tracking routinely find 30–50% of Sonnet usage can move to Haiku with no quality regression.
Output-heavy features are the real cost drivers. A feature generating 2,000-token responses costs 5× more per call than one generating 400 tokens, even at the same input price. Per-feature breakdown makes this visible before it becomes a scaling problem.
One customer outlier. In multi-tenant SaaS products, a single power user often generates 5–20% of total Claude spend. Without per-customer attribution, that customer is invisibly subsidized by everyone else. Per-customer cost tracking is the prerequisite for fair usage enforcement or usage-based pricing.

Setting cost alerts for Claude spend

Cost visibility without alerting is incomplete. Two alert types cover most production needs:

Threshold alert: trigger when total spend or per-customer spend crosses a fixed amount (e.g., $50/day). Catches runaway batch jobs or DDoS-style abuse before they compound.
Anomaly alert: trigger when spend deviates from the statistical baseline by more than N standard deviations. Catches gradual drift — a prompt that grew 3× over two weeks, or a feature rollout that silently increased model tier.

LLMeter supports both alert types for Anthropic providers, with notifications via email or Slack webhook. See the LLM API budget alerts guide for the full setup walkthrough.

Track Anthropic API costs by model and customer — no proxy required.

Free forever for one provider. Upgrade anytime.

Start Free