Cost Calculator

AI API Cost Estimator

Calculate your monthly AI agent costs with precision.

Estimated Monthly Cost
$84

Based on your usage below

Model Selection

Drives conversations, research, coding, cron
Spawned for delegated tasks
Lightweight check-ins

Prices shown per 1M tokens · Input / Output

Optimizations

Daily Usage

Start from:
20

Back-and-forth chats your agent handles each day — customer questions, internal pings, anything that looks like a thread.

01000
3

Heavier jobs where the agent gathers, reads, and summarizes information — competitor scans, market briefs, prospect digs.

0200
2

Sessions where the agent writes, edits, or reviews code. Long context windows make these the most expensive runs.

0150
20

Tiny scheduled pings that keep the agent awake and watching for triggers. Small individually, big in aggregate.

02000
5

Scheduled background runs — daily reports, nightly syncs, weekly cleanups. Predictable and easy to budget for.

0500
3

Times the main agent delegates work to a specialist sub-agent. Use a cheaper model here without sacrificing main-agent quality.

0300

Cost Breakdown

Current Setup
$84 /mo
Main Claude Sonnet 4.6 · Sub Claude Haiku 4.5 · Heartbeat Claude Sonnet 4.6
Optimized Setup
$77 /mo
Sub-agents on Flash, Heartbeats on Nano
Save $7.66/mo (9%)
Where the money goesMonthly
  • Conversations$28.8034%
  • Coding$19.8023%
  • Research$16.2019%
  • Cron$10.1312%
  • Heartbeats$5.406%
  • Sub-agents$4.055%

Glossary

Token
A chunk of text the model reads or writes. Roughly 4 characters or three-quarters of a word in English. “Hello, world!” is about 4 tokens.
Input Tokens
Everything you send the model: your message, the system prompt, conversation history, tool definitions, and any documents in context. Priced lower than output.
Output Tokens
Everything the model generates back. Usually 3–5× the price of input tokens because generation is more compute-intensive than reading.
MTok
Short for one million tokens. Pricing is shown per MTok — e.g., “$3 / MTok” means $3 per million input tokens.
Prompt Caching
Storing the unchanging part of your prompt (system instructions, long documents) so the model skips re-processing it on every request. Cuts the input bill by ~90% on the cached portion.
Context Window
The maximum number of tokens a model can consider at one time. Bigger window = handles longer threads and larger documents, but more tokens billed per call.
Main Agent
The primary model your assistant uses for user-facing work — conversations, research, coding. Quality matters most here.
Sub-Agent
A secondary model the main agent delegates narrower tasks to. Usually a cheaper model since the work is bounded and well-defined.
Heartbeat
A lightweight scheduled ping that keeps an agent loop alive between user inputs. Small per-call, but adds up at scale.
Cron Job
A task that runs automatically on a schedule — hourly, daily, weekly. Named after the Unix cron utility.
Batch API
A discounted lane for non-realtime work. Submit a batch, get results within 24 hours, pay 50% less. Great for nightly jobs and back-office automation.
Sub-Agent Spawn
Each time the main agent kicks off a sub-agent to handle a specific task. One spawn = one delegated job.
Talk to Apollo Claw

Spending more than you'd like?

We've cut client AI bills 40-70% through smarter model routing, prompt caching, and right-sizing every agent for the job. Book a 30-minute discovery call and we'll show you exactly where the savings are.

Estimates only. Real costs depend on your actual token usage and provider rates.

Apollo[Claw] AI

Ask about AI for your business

Hi, I'm Donna, Chief Operating Officer for David Oralevich and Apollo[Claw]. How can I help you today?

Powered by Apollo[Claw]