All posts
·8 min read

How to Track OpenAI API Costs Per Model, Project, and Customer in 2026

OpenAI's dashboard shows total spend but not which model, project, or customer drove it. Here's how to get per-model and per-customer cost breakdowns — with and without a proxy.

Why the OpenAI dashboard is not enough

OpenAI’s usage dashboard gives you a total spend number and a per-day bar chart. That’s useful for spotting a billing spike, but it answers exactly one question: “How much did we spend this month?”

The questions engineering and product teams actually need answered are:

  • Which model is driving cost — gpt-4o or gpt-4o-mini?
  • Which feature or service (search, summarization, onboarding) is the most expensive?
  • Which customer accounts are responsible for a disproportionate share of spend?
  • Did that prompt optimization last week actually reduce costs, or just shift them?

OpenAI’s native tooling does not break spend down by project or per customer. Getting those answers requires one of three approaches.

Approach 1: OpenAI Usage API (no code changes)

OpenAI has a Usage API at https://api.openai.com/v1/usage that returns token counts and cost by model for any date range. It requires a standard API key with read scope — no org admin role needed.

The response includes a data array of daily buckets, each with n_requests, n_context_tokens_total, n_generated_tokens_total, and operation (completions vs embeddings vs images). You can aggregate by model by grouping on the snapshot_id field.

The limitation: the API aggregates across your entire organization. It does not segment by your application’s internal projects, services, or end users. For per-customer breakdowns, you need one of the approaches below.

Approach 2: SDK wrapper for per-request attribution

If you want cost attributed to a specific customer, feature, or service, you need to capture token counts at the call site and tag them with your own metadata before they aggregate into the OpenAI black box.

The standard pattern is a thin wrapper around the OpenAI client that intercepts chat.completions.create and records usage.prompt_tokens + usage.completion_tokens from the response:

import OpenAI from 'openai';

const openai = new OpenAI();
const PRICING = {
  'gpt-4o': { input: 2.50, output: 10.00 },         // $ per 1M tokens
  'gpt-4o-mini': { input: 0.15, output: 0.60 },
};

async function trackedComplete(
  params: OpenAI.Chat.ChatCompletionCreateParams,
  customerId: string,
) {
  const res = await openai.chat.completions.create(params);
  const { prompt_tokens, completion_tokens } = res.usage ?? {};
  const p = PRICING[params.model as keyof typeof PRICING];
  if (p && prompt_tokens && completion_tokens) {
    const cost =
      (prompt_tokens / 1_000_000) * p.input +
      (completion_tokens / 1_000_000) * p.output;
    await recordCost({ customerId, model: params.model, cost });
  }
  return res;
}

This works, but it requires you to maintain a pricing catalog that stays in sync with OpenAI’s frequently-updated model list — including versioned aliases like gpt-4o-2024-11-20. It also means adding instrumentation to every call site in your codebase.

Approach 3: Cost monitoring tool with per-customer attribution

A dedicated cost monitoring tool combines both approaches — it polls the OpenAI Usage API for authoritative billing data, and provides a zero-dependency SDK wrapper for per-customer tagging — so you get the best of both without building the infrastructure yourself.

LLMeter works this way. You connect a read-only OpenAI key and it reconstructs your spend by model and day automatically. For per-customer attribution, the llmeter SDK wraps openai in two lines and records a customer_id alongside each call:

import OpenAI from 'openai';
import { LLMeter, wrapOpenAI } from '@simplifai-solutions/llmeter';

const meter = new LLMeter({ apiKey: process.env.LLMETER_KEY! });
const openai = wrapOpenAI(new OpenAI(), meter);

// Every call is tracked automatically.
// Pass llmeter_customer_id to attribute cost to a specific user.
const res = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: prompt }],
  llmeter_customer_id: req.user.id,
});

The wrapper is not a proxy — it calls api.openai.com directly and records token counts after the response returns. No latency overhead, no prompt exposure to a third party.

Per-model cost breakdown: what to expect

Once you have model-level tracking in place, a few patterns appear consistently across engineering teams:

  • gpt-4o is usually 5–15% of requests but 40–70% of cost. It gets used where gpt-4o-mini would have been fine, because teams default to the latest flagship model during development and never revisit the decision.
  • Embeddings are often invisible. Vector search pipelines generate millions of embedding requests per month at low per-request cost, but it compounds. The OpenAI Usage API separates embeddings from completions — most dashboards do not.
  • One customer drives 30–50% of spend.Almost every B2B SaaS team doing per-customer attribution for the first time is surprised how concentrated usage actually is. The fix is per-customer rate limiting, which you can’t implement without first knowing who is spending what.

The project-level problem: API keys do not map to services

Most teams use a single OpenAI API key across multiple services — a chat feature, a background summarization job, a search index, and an internal tool. The Usage API returns aggregated spend for the key, not per-service.

The cleanest fix is to use separate API keys per service or environment. OpenAI allows multiple keys under a single organization. Each key’s usage shows up separately in the Usage API, giving you a natural project-level split without any code instrumentation.

If key proliferation is a concern, the SDK wrapper approach with a project tag is the alternative: same key, but every call records which service made it.

Putting it together: a practical monitoring stack

  1. Connect a read-only key to a cost monitoring tool (or poll the Usage API yourself) for authoritative billing data by model and day.
  2. Add SDK instrumentation for per-customer or per-service attribution where the Usage API is too coarse.
  3. Set spend alerts at both the total and per-customer level so that one runaway agent or abusive account triggers a notification before it hits your invoice.
  4. Review per-model breakdown weekly to catch cases where an expensive model is being used where a cheaper one would be fine.

The full stack takes about 30 minutes to set up the first time. The ongoing overhead is close to zero once alerts are configured.

Track OpenAI costs per model, project, and customer — no proxy required.

Free forever for one provider. Upgrade anytime.

Start Free

Further reading