Reduce Your OpenClaw AI Costs by 97%
I dug into GitHub issues, API pricing docs, and community forums to find every optimization that actually works. Six config-level changes that took my setup from $1,500+/month projections down to under $50.
The Real Cost Numbers
I spent weeks digging through GitHub issues, Discord threads, and real billing data to understand where OpenClaw costs actually come from. The software itself is free and MIT-licensed. The cost is entirely in API calls. Here's what the data shows.
While researching costs across community forums and GitHub discussions, I found users reporting 180 million tokens burned in a single month — roughly $3,600 in API costs for what's supposed to be a "free" tool. And this isn't an isolated case. Here's one example that caught my attention:
This isn't an outlier. Here's what I found across community reports:
| Usage Level | Monthly Cost | Tokens/Month |
|---|---|---|
| Light (1-2 hrs/day, simple tasks) | $10-30 | ~5M |
| Regular (daily development) | $40-80 | ~20M |
| Power user (complex automations) | $100-200 | ~50M |
| Extreme (heavy automation, 24/7) | $3,600 | 180M |
90% of users spend under $12/day. But the tail is long. One misconfigured heartbeat can burn $50/day. An automation loop left running overnight can rack up $200-500.
The optimizations below target specific cost drivers. Each one is independent. Start with the one that matches your biggest spending category.
Current Claude API Pricing (Feb 2026)
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Best For |
|---|---|---|---|
| Opus 4.5 | $5.00 | $25.00 | Complex reasoning, architecture |
| Sonnet 4.5 | $3.00 | $15.00 | General development, code review |
| Haiku 4.5 | $1.00 | $5.00 | Routine tasks, status checks |
| Gemini Flash-Lite | $0.50 | $0.50 | Heartbeats, pings |
| DeepSeek V3.2 | $0.30 | $1.20 | Simple completions |
The gap between Sonnet and Haiku is 3x on input, 3x on output. For tasks that don't need Sonnet's reasoning, you're paying triple for nothing. And DeepSeek V3 is 1/100th the cost of frontier models for simple tasks.
The Hidden 93.5% Token Waste Bug
This is the single biggest cost driver most users don't know about.
OpenClaw currently injects workspace files (AGENTS.md, SOUL.md, USER.md, etc.) into the system prompt on every single message. That's ~35,600 tokens per message, $1.51 wasted per 100-message session, and 93.5% of your token budget on static content that never changes.
There's a proposed fix in the pipeline: a config option to change injection from "always" to "first-message-only." Until that ships, I've been using the session initialization technique below as a workaround.
The community is actively working on this. But right now, the workaround I've been using is straightforward: tell your agent exactly what to load.
Part 1: Session Initialization
Your agent loads 50KB of history on every message. This wastes 2-3M tokens per session and costs $4/day. Third-party interfaces without built-in session clearing make this worse.
One user reported their main session occupied 56-58% of the 400K context window — over 200K tokens consumed before they typed a single word.
The Fix
Add this session initialization rule to your agent's system prompt. It tells the agent exactly what to load and what to skip:
SESSION INITIALIZATION RULE:
On every session start:
1. Load ONLY these files:
- SOUL.md
- USER.md
- IDENTITY.md
- memory/YYYY-MM-DD.md (if it exists)
2. DO NOT auto-load:
- MEMORY.md
- Session history
- Prior messages
- Previous tool outputs
3. When user asks about prior context:
- Use memory_search() on demand
- Pull only the relevant snippet with memory_get()
- Don't load the whole file
4. Update memory/YYYY-MM-DD.md at end of session with:
- What you worked on
- Decisions made
- Leads generated
- Blockers
- Next steps
Why It Works
Session starts with 8KB instead of 50KB. History loads only when asked. Daily notes become your actual memory. Works with any interface.
Use the /new command to start fresh sessions regularly. Run diagnostic commands in isolated sessions to prevent context bloat in your main workspace. The OpenClaw session management docs cover compaction and pruning strategies in detail.
Before
- 50KB context on startup
- 2-3M tokens wasted per session
- $0.40 per session
- History bloat over time
After
- 8KB context on startup
- Only loads what's needed
- $0.05 per session
- Clean daily memory files
Part 2: Model Routing
Out of the box, OpenClaw defaults to Sonnet for everything. Sonnet is excellent. It's also overkill for checking file status, running simple commands, or routine monitoring.
This is the single highest-impact optimization. Users report 50-80% savings from model routing alone.
After testing model routing across multiple setups, the results were consistent: 50-80% cost reduction just by defaulting to Haiku and reserving Sonnet for complex reasoning. Others in the community are seeing the same thing:
The difference between $0.003/1K tokens and $0.0005/1K is 6x. Most tasks simply don't need frontier-level intelligence.
Step 1: Update Your Config
Your OpenClaw config file lives at ~/.openclaw/openclaw.json:
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-haiku-4-5"
},
"models": {
"anthropic/claude-sonnet-4-5": {
"alias": "sonnet"
},
"anthropic/claude-haiku-4-5": {
"alias": "haiku"
}
}
}
}
}
This sets Haiku as the default and creates aliases so your prompts can say "use sonnet" or "use haiku" to switch on-demand.
Step 2: Add Routing Rules to System Prompt
MODEL SELECTION RULE:
Default: Always use Haiku
Switch to Sonnet ONLY when:
- Architecture decisions
- Production code review
- Security analysis
- Complex debugging/reasoning
- Strategic multi-project decisions
When in doubt: Try Haiku first.
If you want automated routing, OpenRouter's Auto Model (openrouter/openrouter/auto) automatically selects the most cost-effective model for each request. There's also a proposed middleware hook (Issue #10969) for dynamic model routing based on message classification.
The Ultra-Cheap Tier
For tasks that barely need intelligence (heartbeats, pings, simple lookups), you can go even cheaper than Haiku:
| Model | Cost per 1M tokens | vs Opus |
|---|---|---|
| Gemini Flash-Lite | $0.50 | 60x cheaper |
| DeepSeek V3.2 | $0.53 | 56x cheaper |
| GPT-4o-mini | $0.60 | 50x cheaper |
Before
- Sonnet for everything
- $3.00 per 1M input tokens
- Overkill for simple tasks
- $50-70/month on models
After
- Haiku by default, Sonnet when needed
- $1.00 per 1M input tokens
- Right model for the job
- $5-15/month on models
Part 3: Heartbeat to Ollama
OpenClaw sends periodic heartbeat checks to verify your agent is running. By default, these use your paid API. Running 24/7, that's real money for a health check.
One user set email checks every 5 minutes via heartbeat. Each heartbeat included the full session context. Result: $50/day from heartbeats alone.
Source: Running OpenClaw Without Burning Money (GitHub Gist)
The fix: route heartbeats to a free local LLM using Ollama. This is fully documented in the OpenClaw docs.
Step 1: Install Ollama
# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Pull a lightweight model for heartbeats
ollama pull llama3.2:3b
llama3.2:3b (2GB) balances size and capability. For function calling, the OpenClaw docs recommend qwen2.5-coder or qwen3. Minimum context requirement is 64K tokens. Streaming is disabled by default for Ollama due to SDK response format issues.
Step 2: Configure OpenClaw for Ollama Heartbeat
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-haiku-4-5"
},
"models": {
"anthropic/claude-sonnet-4-5": { "alias": "sonnet" },
"anthropic/claude-haiku-4-5": { "alias": "haiku" }
}
}
},
"heartbeat": {
"every": "1h",
"model": "ollama/llama3.2:3b",
"session": "main",
"target": "slack",
"prompt": "Check: Any blockers, opportunities, or progress updates needed?"
}
}
Step 3: Verify
# Make sure Ollama is running
ollama serve
# In another terminal, test the model
ollama run llama3.2:3b "respond with OK"
# Should respond quickly with "OK"
Before
- Heartbeats use paid API
- Each heartbeat = full context resent
- $5-50/month on heartbeats
- Adds to rate limit usage
After
- Heartbeats use free local LLM
- Zero API calls for heartbeats
- $0/month for heartbeats
- No impact on rate limits
Part 4: Rate Limits & Budget Controls
Even with model routing and optimized sessions, runaway automation can still burn through tokens. These rate limits act as guardrails.
I've seen multiple reports in GitHub discussions of users waking up to $200-500 bills from agents stuck in repetitive task loops overnight. One common pattern: a failed operation the agent keeps retrying indefinitely. Without rate limits, there's nothing to stop the bleeding.
Add to Your System Prompt
RATE LIMITS:
- 5 seconds minimum between API calls
- 10 seconds between web searches
- Max 5 searches per batch, then 2-minute break
- Batch similar work (one request for 10 leads, not 10 requests)
- If you hit 429 error: STOP, wait 5 minutes, retry
DAILY BUDGET: $5 (warning at 75%)
MONTHLY BUDGET: $200 (warning at 75%)
| Limit | What It Prevents |
|---|---|
| 5s between API calls | Rapid-fire requests that burn tokens |
| 10s between searches | Expensive search loops |
| 5 searches max, then break | Runaway research tasks |
| Batch similar work | 10 calls when 1 would do |
| Budget warnings at 75% | Surprise bills at end of month |
Workspace File Templates
Keep these lean. Every line costs tokens on every request.
# SOUL.md
## Core Principles
[YOUR AGENT PRINCIPLES HERE]
## Model Selection
Default: Haiku
Switch to Sonnet only for: architecture, security, complex reasoning
## Rate Limits
5s between API calls, 10s between searches, max 5/batch then 2min break
# USER.md
- **Name:** [YOUR NAME]
- **Timezone:** [YOUR TIMEZONE]
- **Mission:** [WHAT YOU'RE BUILDING]
## Success Metrics
- [METRIC 1]
- [METRIC 2]
- [METRIC 3]
Part 5: Prompt Caching
Your system prompt, workspace files, and reference materials get sent with every API call. Prompt caching gives you a 90% discount on repeated content.
How It Actually Works
| Event | Cost | Notes |
|---|---|---|
| Cache write (5-min TTL) | 1.25x base input price | Slight premium on first send |
| Cache write (1-hour TTL) | 2x base input price | $6/M for Sonnet vs $3/M base |
| Cache read (hit) | 0.1x base input price | 90% discount |
| Cache miss | 1x base input price | Full price, cache expired |
Configure your heartbeat interval to be slightly less than the cache TTL. Example: cache TTL = 1 hour, heartbeat every 55 minutes. This keeps your cache warm and prevents expensive cold starts. Source: Apiyi token guide
What to Cache vs Skip
Cache These (Stable)
- System prompts (SOUL.md, USER.md)
- Tool documentation (TOOLS.md)
- Reference materials (docs, specs)
- Project templates
Skip These (Dynamic)
- Daily memory files
- Recent user messages
- Tool outputs (change per task)
- Active project notes
File Structure for Maximum Cache Hits
/workspace/
├── SOUL.md ← Cache (stable)
├── USER.md ← Cache (stable)
├── TOOLS.md ← Cache (stable)
├── memory/
│ ├── MEMORY.md ← Don't cache (updated frequently)
│ └── 2026-02-03.md ← Don't cache (daily notes)
└── projects/
└── [PROJECT]/REFERENCE.md ← Cache (stable docs)
Enable in Config
{
"agents": {
"defaults": {
"cache": {
"enabled": true,
"ttl": "5m",
"priority": "high"
},
"models": {
"anthropic/claude-sonnet-4-5": {
"alias": "sonnet",
"cache": true
},
"anthropic/claude-haiku-4-5": {
"alias": "haiku",
"cache": false
}
}
}
}
}
The Claude Batch API gives a flat 50% discount on both input and output tokens. Batches typically finish in under 1 hour. You can combine batch pricing with prompt caching for even deeper savings. Ideal for background tasks and bulk operations where immediate response isn't critical.
Real-World Example: 50 Outreach Drafts/Week
| Metric | Without Caching | With Caching (Batched) |
|---|---|---|
| System prompt | 5KB x 50 = 250KB/week | 1 write + 49 cached |
| System prompt cost | $0.75/week | $0.016/week |
| 50 drafts | $1.20/week | $0.60/week (~50% cache hits) |
| Total | $1.95/week ($102/mo) | $0.62/week ($32/mo) |
Savings: $70/month on a single workflow.
When NOT to Cache
- Haiku tasks — too cheap to justify caching overhead.
- Frequent prompt changes — cache invalidation costs more than caching saves.
- Small requests (<1KB) — caching overhead eats the discount.
- Development/testing — too many prompt iterations cause cache thrashing.
Part 6: Monitoring & Observability
You can't optimize what you can't measure. These tools show where your tokens actually go.
| Tool | What It Does | Link |
|---|---|---|
| Tapes | Records every API call. Full visibility into prompts, token usage, agent behavior. | Built-in |
| OpenTelemetry | Built-in OTEL support (v2026.2+). Metrics: openclaw.tokens, openclaw.cost.usd, openclaw.context.tokens | GitHub |
| TokScale | CLI tool for real-time token usage tracking with pricing breakdowns. | GitHub |
| Portkey | Request logs, cost tracking, automatic failovers, team controls. | Docs |
Verify Your Setup
# Start a session
openclaw shell
# Check current status
session_status
# You should see:
# - Context size: 2-8KB (not 50KB+)
# - Model: Haiku (not Sonnet)
# - Heartbeat: Ollama/local
Cache Performance Check
# Check cache effectiveness
openclaw shell
session_status
# Look for cache metrics:
# Cache hits: 45/50 (90%)
# Cache tokens used: 225KB (vs 250KB without cache)
# Cost savings: $0.22 this session
Troubleshooting
| Issue | Fix |
|---|---|
| Context size still large | Check session initialization rules are in system prompt |
| Still using Sonnet for everything | Verify ~/.openclaw/openclaw.json syntax and path |
| Heartbeat errors | Make sure Ollama is running (ollama serve) |
| Costs haven't dropped | Use TokScale or Tapes to see where tokens actually go |
| Cache hit rate below 80% | System prompt is changing too often. Batch updates to maintenance windows. |
Quick Reference Checklist
The Bottom Line
No complex infrastructure changes needed. After testing all of these across multiple setups, the pattern is clear: smart config, clear rules in your system prompt, a free local LLM for heartbeats, and monitoring to know where your tokens go.
| Optimization | Savings | Effort |
|---|---|---|
| Session Initialization | 80% less context overhead | 5 min (system prompt edit) |
| Model Routing (Haiku default) | 50-80% on model costs | 5 min (config change) |
| Heartbeat to Ollama | 100% heartbeat costs gone | 10 min (install + config) |
| Rate Limits | Prevents $200+ runaway bills | 5 min (system prompt edit) |
| Prompt Caching | 90% on repeated content | 5 min (config change) |
| Monitoring (TokScale/Tapes) | Visibility into actual spend | 5 min (install) |
Combined result: from $1,500+/month down to $30-50/month.
The intelligence is in the prompt, not the infrastructure.