Guide

Reduce Your OpenClaw AI Costs by 97%

I dug into GitHub issues, API pricing docs, and community forums to find every optimization that actually works. Six config-level changes that took my setup from $1,500+/month projections down to under $50.

Rohit Ghumare · @ghumare64 Feb 2026 15 min read

Token Optimization Cost Reduction Model Routing Prompt Caching Ollama

97%

Token Reduction

~$50

Target Monthly Cost

Local Heartbeat

90%

Cache Discount

The Real Cost Numbers

I spent weeks digging through GitHub issues, Discord threads, and real billing data to understand where OpenClaw costs actually come from. The software itself is free and MIT-licensed. The cost is entirely in API calls. Here's what the data shows.

While researching costs across community forums and GitHub discussions, I found users reporting 180 million tokens burned in a single month — roughly $3,600 in API costs for what's supposed to be a "free" tool. And this isn't an isolated case. Here's one example that caught my attention:

Federico Viticci

@viticci

180 million tokens in a month with OpenClaw. Roughly $3,600 in API costs.

Jan 2026

View on X →

This isn't an outlier. Here's what I found across community reports:

Usage Level	Monthly Cost	Tokens/Month
Light (1-2 hrs/day, simple tasks)	$10-30	~5M
Regular (daily development)	$40-80	~20M
Power user (complex automations)	$100-200	~50M
Extreme (heavy automation, 24/7)	$3,600	180M

🔗

I Tried OpenClaw, The 'Free' AI Agent. Here's My $500 Reality Check

dev.to/thegdsks

90% of users spend under $12/day. But the tail is long. One misconfigured heartbeat can burn $50/day. An automation loop left running overnight can rack up $200-500.

The optimizations below target specific cost drivers. Each one is independent. Start with the one that matches your biggest spending category.

Current Claude API Pricing (Feb 2026)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Best For
Opus 4.5	$5.00	$25.00	Complex reasoning, architecture
Sonnet 4.5	$3.00	$15.00	General development, code review
Haiku 4.5	$1.00	$5.00	Routine tasks, status checks
Gemini Flash-Lite	$0.50	$0.50	Heartbeats, pings
DeepSeek V3.2	$0.30	$1.20	Simple completions

🔗

Anthropic API Pricing 2026: Complete Cost Breakdown

metacto.com

The gap between Sonnet and Haiku is 3x on input, 3x on output. For tasks that don't need Sonnet's reasoning, you're paying triple for nothing. And DeepSeek V3 is 1/100th the cost of frontier models for simple tasks.

The Hidden 93.5% Token Waste Bug

This is the single biggest cost driver most users don't know about.

GitHub Issue #9157

OpenClaw currently injects workspace files (AGENTS.md, SOUL.md, USER.md, etc.) into the system prompt on every single message. That's ~35,600 tokens per message, $1.51 wasted per 100-message session, and 93.5% of your token budget on static content that never changes.

View Issue #9157 →

There's a proposed fix in the pipeline: a config option to change injection from "always" to "first-message-only." Until that ships, I've been using the session initialization technique below as a workaround.

🔗

Strategies to trim/optimize System Prompt (Discussion #10234)

github.com/openclaw/openclaw/discussions

The community is actively working on this. But right now, the workaround I've been using is straightforward: tell your agent exactly what to load.

Part 1: Session Initialization

The Problem

Your agent loads 50KB of history on every message. This wastes 2-3M tokens per session and costs $4/day. Third-party interfaces without built-in session clearing make this worse.

One user reported their main session occupied 56-58% of the 400K context window — over 200K tokens consumed before they typed a single word.

🔗

Why is OpenClaw so token-intensive? (Apiyi.com)

help.apiyi.com

The Fix

Add this session initialization rule to your agent's system prompt. It tells the agent exactly what to load and what to skip:

SESSION INITIALIZATION RULE:

On every session start:
1. Load ONLY these files:
   - SOUL.md
   - USER.md
   - IDENTITY.md
   - memory/YYYY-MM-DD.md (if it exists)

2. DO NOT auto-load:
   - MEMORY.md
   - Session history
   - Prior messages
   - Previous tool outputs

3. When user asks about prior context:
   - Use memory_search() on demand
   - Pull only the relevant snippet with memory_get()
   - Don't load the whole file

4. Update memory/YYYY-MM-DD.md at end of session with:
   - What you worked on
   - Decisions made
   - Leads generated
   - Blockers
   - Next steps

Why It Works

Session starts with 8KB instead of 50KB. History loads only when asked. Daily notes become your actual memory. Works with any interface.

Pro Tip: Fresh Sessions

Use the /new command to start fresh sessions regularly. Run diagnostic commands in isolated sessions to prevent context bloat in your main workspace. The OpenClaw session management docs cover compaction and pruning strategies in detail.

Before

50KB context on startup
2-3M tokens wasted per session
$0.40 per session
History bloat over time

After

8KB context on startup
Only loads what's needed
$0.05 per session
Clean daily memory files

Part 2: Model Routing

Out of the box, OpenClaw defaults to Sonnet for everything. Sonnet is excellent. It's also overkill for checking file status, running simple commands, or routine monitoring.

This is the single highest-impact optimization. Users report 50-80% savings from model routing alone.

After testing model routing across multiple setups, the results were consistent: 50-80% cost reduction just by defaulting to Haiku and reserving Sonnet for complex reasoning. Others in the community are seeing the same thing:

Zen Van Riel

@zenvanriel

Smart model routing cut my OpenClaw costs by 60%. Route simple tasks to Haiku or Gemini Flash-Lite. Reserve Sonnet for real work.

Jan 2026

View on X →

The difference between $0.003/1K tokens and $0.0005/1K is 6x. Most tasks simply don't need frontier-level intelligence.

Step 1: Update Your Config

Your OpenClaw config file lives at ~/.openclaw/openclaw.json:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-haiku-4-5"
      },
      "models": {
        "anthropic/claude-sonnet-4-5": {
          "alias": "sonnet"
        },
        "anthropic/claude-haiku-4-5": {
          "alias": "haiku"
        }
      }
    }
  }
}

This sets Haiku as the default and creates aliases so your prompts can say "use sonnet" or "use haiku" to switch on-demand.

Step 2: Add Routing Rules to System Prompt

MODEL SELECTION RULE:

Default: Always use Haiku
Switch to Sonnet ONLY when:
- Architecture decisions
- Production code review
- Security analysis
- Complex debugging/reasoning
- Strategic multi-project decisions

When in doubt: Try Haiku first.

OpenRouter Auto Model

If you want automated routing, OpenRouter's Auto Model (openrouter/openrouter/auto) automatically selects the most cost-effective model for each request. There's also a proposed middleware hook (Issue #10969) for dynamic model routing based on message classification.

The Ultra-Cheap Tier

For tasks that barely need intelligence (heartbeats, pings, simple lookups), you can go even cheaper than Haiku:

Model	Cost per 1M tokens	vs Opus
Gemini Flash-Lite	$0.50	60x cheaper
DeepSeek V3.2	$0.53	56x cheaper
GPT-4o-mini	$0.60	50x cheaper

🔗

Stop overpaying for OpenClaw: Multi-model routing guide

velvetshark.com

Before

Sonnet for everything
$3.00 per 1M input tokens
Overkill for simple tasks
$50-70/month on models

After

Haiku by default, Sonnet when needed
$1.00 per 1M input tokens
Right model for the job
$5-15/month on models

Part 3: Heartbeat to Ollama

OpenClaw sends periodic heartbeat checks to verify your agent is running. By default, these use your paid API. Running 24/7, that's real money for a health check.

Real Case

One user set email checks every 5 minutes via heartbeat. Each heartbeat included the full session context. Result: $50/day from heartbeats alone.

Source: Running OpenClaw Without Burning Money (GitHub Gist)

The fix: route heartbeats to a free local LLM using Ollama. This is fully documented in the OpenClaw docs.

Step 1: Install Ollama

# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a lightweight model for heartbeats
ollama pull llama3.2:3b

Model Choice

llama3.2:3b (2GB) balances size and capability. For function calling, the OpenClaw docs recommend qwen2.5-coder or qwen3. Minimum context requirement is 64K tokens. Streaming is disabled by default for Ollama due to SDK response format issues.

Step 2: Configure OpenClaw for Ollama Heartbeat

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-haiku-4-5"
      },
      "models": {
        "anthropic/claude-sonnet-4-5": { "alias": "sonnet" },
        "anthropic/claude-haiku-4-5": { "alias": "haiku" }
      }
    }
  },
  "heartbeat": {
    "every": "1h",
    "model": "ollama/llama3.2:3b",
    "session": "main",
    "target": "slack",
    "prompt": "Check: Any blockers, opportunities, or progress updates needed?"
  }
}

Step 3: Verify

# Make sure Ollama is running
ollama serve

# In another terminal, test the model
ollama run llama3.2:3b "respond with OK"
# Should respond quickly with "OK"

🔗

Using OpenClaw with Ollama: Building a Local Data Analyst

datacamp.com/tutorial

🔗

Ollama Provider - Official OpenClaw Documentation

docs.openclaw.ai/providers/ollama

Before

Heartbeats use paid API
Each heartbeat = full context resent
$5-50/month on heartbeats
Adds to rate limit usage

After

Heartbeats use free local LLM
Zero API calls for heartbeats
$0/month for heartbeats
No impact on rate limits

Part 4: Rate Limits & Budget Controls

Even with model routing and optimized sessions, runaway automation can still burn through tokens. These rate limits act as guardrails.

I've seen multiple reports in GitHub discussions of users waking up to $200-500 bills from agents stuck in repetitive task loops overnight. One common pattern: a failed operation the agent keeps retrying indefinitely. Without rate limits, there's nothing to stop the bleeding.

Add to Your System Prompt

RATE LIMITS:

- 5 seconds minimum between API calls
- 10 seconds between web searches
- Max 5 searches per batch, then 2-minute break
- Batch similar work (one request for 10 leads, not 10 requests)
- If you hit 429 error: STOP, wait 5 minutes, retry

DAILY BUDGET: $5 (warning at 75%)
MONTHLY BUDGET: $200 (warning at 75%)

Limit	What It Prevents
5s between API calls	Rapid-fire requests that burn tokens
10s between searches	Expensive search loops
5 searches max, then break	Runaway research tasks
Batch similar work	10 calls when 1 would do
Budget warnings at 75%	Surprise bills at end of month

Workspace File Templates

Keep these lean. Every line costs tokens on every request.

# SOUL.md

## Core Principles
[YOUR AGENT PRINCIPLES HERE]

## Model Selection
Default: Haiku
Switch to Sonnet only for: architecture, security, complex reasoning

## Rate Limits
5s between API calls, 10s between searches, max 5/batch then 2min break

# USER.md

- **Name:** [YOUR NAME]
- **Timezone:** [YOUR TIMEZONE]
- **Mission:** [WHAT YOU'RE BUILDING]

## Success Metrics
- [METRIC 1]
- [METRIC 2]
- [METRIC 3]

Part 5: Prompt Caching

Your system prompt, workspace files, and reference materials get sent with every API call. Prompt caching gives you a 90% discount on repeated content.

How It Actually Works

Event	Cost	Notes
Cache write (5-min TTL)	1.25x base input price	Slight premium on first send
Cache write (1-hour TTL)	2x base input price	$6/M for Sonnet vs $3/M base
Cache read (hit)	0.1x base input price	90% discount
Cache miss	1x base input price	Full price, cache expired

🔗

Prompt Caching - Official Claude API Documentation

platform.claude.com/docs

Cache Warming Strategy

Configure your heartbeat interval to be slightly less than the cache TTL. Example: cache TTL = 1 hour, heartbeat every 55 minutes. This keeps your cache warm and prevents expensive cold starts. Source: Apiyi token guide

What to Cache vs Skip

Cache These (Stable)

System prompts (SOUL.md, USER.md)
Tool documentation (TOOLS.md)
Reference materials (docs, specs)
Project templates

Skip These (Dynamic)

Daily memory files
Recent user messages
Tool outputs (change per task)
Active project notes

File Structure for Maximum Cache Hits

/workspace/
  ├── SOUL.md                     ← Cache (stable)
  ├── USER.md                     ← Cache (stable)
  ├── TOOLS.md                    ← Cache (stable)
  ├── memory/
  │   ├── MEMORY.md               ← Don't cache (updated frequently)
  │   └── 2026-02-03.md           ← Don't cache (daily notes)
  └── projects/
      └── [PROJECT]/REFERENCE.md  ← Cache (stable docs)

Enable in Config

{
  "agents": {
    "defaults": {
      "cache": {
        "enabled": true,
        "ttl": "5m",
        "priority": "high"
      },
      "models": {
        "anthropic/claude-sonnet-4-5": {
          "alias": "sonnet",
          "cache": true
        },
        "anthropic/claude-haiku-4-5": {
          "alias": "haiku",
          "cache": false
        }
      }
    }
  }
}

Batch API: Another 50% Off

The Claude Batch API gives a flat 50% discount on both input and output tokens. Batches typically finish in under 1 hour. You can combine batch pricing with prompt caching for even deeper savings. Ideal for background tasks and bulk operations where immediate response isn't critical.

Real-World Example: 50 Outreach Drafts/Week

Metric	Without Caching	With Caching (Batched)
System prompt	5KB x 50 = 250KB/week	1 write + 49 cached
System prompt cost	$0.75/week	$0.016/week
50 drafts	$1.20/week	$0.60/week (~50% cache hits)
Total	$1.95/week ($102/mo)	$0.62/week ($32/mo)

Savings: $70/month on a single workflow.

When NOT to Cache

Haiku tasks — too cheap to justify caching overhead.
Frequent prompt changes — cache invalidation costs more than caching saves.
Small requests (<1KB) — caching overhead eats the discount.
Development/testing — too many prompt iterations cause cache thrashing.

Part 6: Monitoring & Observability

You can't optimize what you can't measure. These tools show where your tokens actually go.

Tool	What It Does	Link
Tapes	Records every API call. Full visibility into prompts, token usage, agent behavior.	Built-in
OpenTelemetry	Built-in OTEL support (v2026.2+). Metrics: `openclaw.tokens`, `openclaw.cost.usd`, `openclaw.context.tokens`	GitHub
TokScale	CLI tool for real-time token usage tracking with pricing breakdowns.	GitHub
Portkey	Request logs, cost tracking, automatic failovers, team controls.	Docs

Verify Your Setup

# Start a session
openclaw shell

# Check current status
session_status

# You should see:
# - Context size: 2-8KB (not 50KB+)
# - Model: Haiku (not Sonnet)
# - Heartbeat: Ollama/local

Cache Performance Check

# Check cache effectiveness
openclaw shell
session_status

# Look for cache metrics:
# Cache hits: 45/50 (90%)
# Cache tokens used: 225KB (vs 250KB without cache)
# Cost savings: $0.22 this session

Troubleshooting

Issue	Fix
Context size still large	Check session initialization rules are in system prompt
Still using Sonnet for everything	Verify `~/.openclaw/openclaw.json` syntax and path
Heartbeat errors	Make sure Ollama is running (`ollama serve`)
Costs haven't dropped	Use TokScale or Tapes to see where tokens actually go
Cache hit rate below 80%	System prompt is changing too often. Batch updates to maintenance windows.

Quick Reference Checklist

Session Initialization

Added SESSION INITIALIZATION RULE to system prompt

Configured idle window (24h) and archive period (90d)

Model Routing

Updated ~/.openclaw/openclaw.json with Haiku as default

Added MODEL SELECTION RULE to system prompt

Evaluated ultra-cheap models for heartbeats (Gemini Flash-Lite, DeepSeek)

Heartbeat to Ollama

Installed Ollama and pulled llama3.2:3b (or qwen2.5-coder)

Added heartbeat config pointing to Ollama

Verified Ollama is running (ollama serve)

Rate Limits & Workspace

Added RATE LIMITS to system prompt

Created lean SOUL.md and USER.md

Set daily ($5) and monthly ($200) budget caps

Prompt Caching

Enabled caching in config (Sonnet: true, Haiku: false)

Separated stable files from dynamic files

Set heartbeat interval < cache TTL for warm cache

Monitoring

Installed TokScale or configured Tapes

Verified cache hit rate > 80%

Ran session_status to confirm

The Bottom Line

No complex infrastructure changes needed. After testing all of these across multiple setups, the pattern is clear: smart config, clear rules in your system prompt, a free local LLM for heartbeats, and monitoring to know where your tokens go.

Optimization	Savings	Effort
Session Initialization	80% less context overhead	5 min (system prompt edit)
Model Routing (Haiku default)	50-80% on model costs	5 min (config change)
Heartbeat to Ollama	100% heartbeat costs gone	10 min (install + config)
Rate Limits	Prevents $200+ runaway bills	5 min (system prompt edit)
Prompt Caching	90% on repeated content	5 min (config change)
Monitoring (TokScale/Tapes)	Visibility into actual spend	5 min (install)

Combined result: from $1,500+/month down to $30-50/month.

The intelligence is in the prompt, not the infrastructure.

The Real Cost Numbers

Current Claude API Pricing (Feb 2026)

The Hidden 93.5% Token Waste Bug

Part 1: Session Initialization

The Fix

Why It Works

Before

After

Part 2: Model Routing

Step 1: Update Your Config

Step 2: Add Routing Rules to System Prompt

The Ultra-Cheap Tier

Before

After

Part 3: Heartbeat to Ollama

Step 1: Install Ollama

Step 2: Configure OpenClaw for Ollama Heartbeat

Step 3: Verify

Before

After

Part 4: Rate Limits & Budget Controls

Add to Your System Prompt

Workspace File Templates

Part 5: Prompt Caching

How It Actually Works

What to Cache vs Skip

Cache These (Stable)

Skip These (Dynamic)

File Structure for Maximum Cache Hits

Enable in Config

Real-World Example: 50 Outreach Drafts/Week

When NOT to Cache

Part 6: Monitoring & Observability

Verify Your Setup

Cache Performance Check

Troubleshooting

Quick Reference Checklist

The Bottom Line

Further Reading