Guide

Reduce Your OpenClaw AI Costs by 97%

I dug into GitHub issues, API pricing docs, and community forums to find every optimization that actually works. Six config-level changes that took my setup from $1,500+/month projections down to under $50.

Token Optimization Cost Reduction Model Routing Prompt Caching Ollama
97%
Token Reduction
~$50
Target Monthly Cost
$0
Local Heartbeat
90%
Cache Discount

The Real Cost Numbers

I spent weeks digging through GitHub issues, Discord threads, and real billing data to understand where OpenClaw costs actually come from. The software itself is free and MIT-licensed. The cost is entirely in API calls. Here's what the data shows.

While researching costs across community forums and GitHub discussions, I found users reporting 180 million tokens burned in a single month — roughly $3,600 in API costs for what's supposed to be a "free" tool. And this isn't an isolated case. Here's one example that caught my attention:

Federico Viticci
@viticci
180 million tokens in a month with OpenClaw. Roughly $3,600 in API costs.
Jan 2026

This isn't an outlier. Here's what I found across community reports:

Usage LevelMonthly CostTokens/Month
Light (1-2 hrs/day, simple tasks)$10-30~5M
Regular (daily development)$40-80~20M
Power user (complex automations)$100-200~50M
Extreme (heavy automation, 24/7)$3,600180M
🔗
I Tried OpenClaw, The 'Free' AI Agent. Here's My $500 Reality Check
dev.to/thegdsks

90% of users spend under $12/day. But the tail is long. One misconfigured heartbeat can burn $50/day. An automation loop left running overnight can rack up $200-500.

The optimizations below target specific cost drivers. Each one is independent. Start with the one that matches your biggest spending category.

Current Claude API Pricing (Feb 2026)

ModelInput (per 1M tokens)Output (per 1M tokens)Best For
Opus 4.5$5.00$25.00Complex reasoning, architecture
Sonnet 4.5$3.00$15.00General development, code review
Haiku 4.5$1.00$5.00Routine tasks, status checks
Gemini Flash-Lite$0.50$0.50Heartbeats, pings
DeepSeek V3.2$0.30$1.20Simple completions
🔗
Anthropic API Pricing 2026: Complete Cost Breakdown
metacto.com

The gap between Sonnet and Haiku is 3x on input, 3x on output. For tasks that don't need Sonnet's reasoning, you're paying triple for nothing. And DeepSeek V3 is 1/100th the cost of frontier models for simple tasks.


The Hidden 93.5% Token Waste Bug

This is the single biggest cost driver most users don't know about.

GitHub Issue #9157

OpenClaw currently injects workspace files (AGENTS.md, SOUL.md, USER.md, etc.) into the system prompt on every single message. That's ~35,600 tokens per message, $1.51 wasted per 100-message session, and 93.5% of your token budget on static content that never changes.

View Issue #9157 →

There's a proposed fix in the pipeline: a config option to change injection from "always" to "first-message-only." Until that ships, I've been using the session initialization technique below as a workaround.

🔗
Strategies to trim/optimize System Prompt (Discussion #10234)
github.com/openclaw/openclaw/discussions

The community is actively working on this. But right now, the workaround I've been using is straightforward: tell your agent exactly what to load.


Part 1: Session Initialization

The Problem

Your agent loads 50KB of history on every message. This wastes 2-3M tokens per session and costs $4/day. Third-party interfaces without built-in session clearing make this worse.

One user reported their main session occupied 56-58% of the 400K context window — over 200K tokens consumed before they typed a single word.

🔗
Why is OpenClaw so token-intensive? (Apiyi.com)
help.apiyi.com

The Fix

Add this session initialization rule to your agent's system prompt. It tells the agent exactly what to load and what to skip:

SESSION INITIALIZATION RULE:

On every session start:
1. Load ONLY these files:
   - SOUL.md
   - USER.md
   - IDENTITY.md
   - memory/YYYY-MM-DD.md (if it exists)

2. DO NOT auto-load:
   - MEMORY.md
   - Session history
   - Prior messages
   - Previous tool outputs

3. When user asks about prior context:
   - Use memory_search() on demand
   - Pull only the relevant snippet with memory_get()
   - Don't load the whole file

4. Update memory/YYYY-MM-DD.md at end of session with:
   - What you worked on
   - Decisions made
   - Leads generated
   - Blockers
   - Next steps

Why It Works

Session starts with 8KB instead of 50KB. History loads only when asked. Daily notes become your actual memory. Works with any interface.

Pro Tip: Fresh Sessions

Use the /new command to start fresh sessions regularly. Run diagnostic commands in isolated sessions to prevent context bloat in your main workspace. The OpenClaw session management docs cover compaction and pruning strategies in detail.

Before

  • 50KB context on startup
  • 2-3M tokens wasted per session
  • $0.40 per session
  • History bloat over time

After

  • 8KB context on startup
  • Only loads what's needed
  • $0.05 per session
  • Clean daily memory files

Part 2: Model Routing

Out of the box, OpenClaw defaults to Sonnet for everything. Sonnet is excellent. It's also overkill for checking file status, running simple commands, or routine monitoring.

This is the single highest-impact optimization. Users report 50-80% savings from model routing alone.

After testing model routing across multiple setups, the results were consistent: 50-80% cost reduction just by defaulting to Haiku and reserving Sonnet for complex reasoning. Others in the community are seeing the same thing:

Zen Van Riel
@zenvanriel
Smart model routing cut my OpenClaw costs by 60%. Route simple tasks to Haiku or Gemini Flash-Lite. Reserve Sonnet for real work.
Jan 2026

The difference between $0.003/1K tokens and $0.0005/1K is 6x. Most tasks simply don't need frontier-level intelligence.

Step 1: Update Your Config

Your OpenClaw config file lives at ~/.openclaw/openclaw.json:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-haiku-4-5"
      },
      "models": {
        "anthropic/claude-sonnet-4-5": {
          "alias": "sonnet"
        },
        "anthropic/claude-haiku-4-5": {
          "alias": "haiku"
        }
      }
    }
  }
}

This sets Haiku as the default and creates aliases so your prompts can say "use sonnet" or "use haiku" to switch on-demand.

Step 2: Add Routing Rules to System Prompt

MODEL SELECTION RULE:

Default: Always use Haiku
Switch to Sonnet ONLY when:
- Architecture decisions
- Production code review
- Security analysis
- Complex debugging/reasoning
- Strategic multi-project decisions

When in doubt: Try Haiku first.
OpenRouter Auto Model

If you want automated routing, OpenRouter's Auto Model (openrouter/openrouter/auto) automatically selects the most cost-effective model for each request. There's also a proposed middleware hook (Issue #10969) for dynamic model routing based on message classification.

The Ultra-Cheap Tier

For tasks that barely need intelligence (heartbeats, pings, simple lookups), you can go even cheaper than Haiku:

ModelCost per 1M tokensvs Opus
Gemini Flash-Lite$0.5060x cheaper
DeepSeek V3.2$0.5356x cheaper
GPT-4o-mini$0.6050x cheaper
🔗
Stop overpaying for OpenClaw: Multi-model routing guide
velvetshark.com

Before

  • Sonnet for everything
  • $3.00 per 1M input tokens
  • Overkill for simple tasks
  • $50-70/month on models

After

  • Haiku by default, Sonnet when needed
  • $1.00 per 1M input tokens
  • Right model for the job
  • $5-15/month on models

Part 3: Heartbeat to Ollama

OpenClaw sends periodic heartbeat checks to verify your agent is running. By default, these use your paid API. Running 24/7, that's real money for a health check.

Real Case

One user set email checks every 5 minutes via heartbeat. Each heartbeat included the full session context. Result: $50/day from heartbeats alone.

Source: Running OpenClaw Without Burning Money (GitHub Gist)

The fix: route heartbeats to a free local LLM using Ollama. This is fully documented in the OpenClaw docs.

Step 1: Install Ollama

# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a lightweight model for heartbeats
ollama pull llama3.2:3b
Model Choice

llama3.2:3b (2GB) balances size and capability. For function calling, the OpenClaw docs recommend qwen2.5-coder or qwen3. Minimum context requirement is 64K tokens. Streaming is disabled by default for Ollama due to SDK response format issues.

Step 2: Configure OpenClaw for Ollama Heartbeat

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-haiku-4-5"
      },
      "models": {
        "anthropic/claude-sonnet-4-5": { "alias": "sonnet" },
        "anthropic/claude-haiku-4-5": { "alias": "haiku" }
      }
    }
  },
  "heartbeat": {
    "every": "1h",
    "model": "ollama/llama3.2:3b",
    "session": "main",
    "target": "slack",
    "prompt": "Check: Any blockers, opportunities, or progress updates needed?"
  }
}

Step 3: Verify

# Make sure Ollama is running
ollama serve

# In another terminal, test the model
ollama run llama3.2:3b "respond with OK"
# Should respond quickly with "OK"
🔗
Using OpenClaw with Ollama: Building a Local Data Analyst
datacamp.com/tutorial
🔗
Ollama Provider - Official OpenClaw Documentation
docs.openclaw.ai/providers/ollama

Before

  • Heartbeats use paid API
  • Each heartbeat = full context resent
  • $5-50/month on heartbeats
  • Adds to rate limit usage

After

  • Heartbeats use free local LLM
  • Zero API calls for heartbeats
  • $0/month for heartbeats
  • No impact on rate limits

Part 4: Rate Limits & Budget Controls

Even with model routing and optimized sessions, runaway automation can still burn through tokens. These rate limits act as guardrails.

I've seen multiple reports in GitHub discussions of users waking up to $200-500 bills from agents stuck in repetitive task loops overnight. One common pattern: a failed operation the agent keeps retrying indefinitely. Without rate limits, there's nothing to stop the bleeding.

Add to Your System Prompt

RATE LIMITS:

- 5 seconds minimum between API calls
- 10 seconds between web searches
- Max 5 searches per batch, then 2-minute break
- Batch similar work (one request for 10 leads, not 10 requests)
- If you hit 429 error: STOP, wait 5 minutes, retry

DAILY BUDGET: $5 (warning at 75%)
MONTHLY BUDGET: $200 (warning at 75%)
LimitWhat It Prevents
5s between API callsRapid-fire requests that burn tokens
10s between searchesExpensive search loops
5 searches max, then breakRunaway research tasks
Batch similar work10 calls when 1 would do
Budget warnings at 75%Surprise bills at end of month

Workspace File Templates

Keep these lean. Every line costs tokens on every request.

# SOUL.md

## Core Principles
[YOUR AGENT PRINCIPLES HERE]

## Model Selection
Default: Haiku
Switch to Sonnet only for: architecture, security, complex reasoning

## Rate Limits
5s between API calls, 10s between searches, max 5/batch then 2min break
# USER.md

- **Name:** [YOUR NAME]
- **Timezone:** [YOUR TIMEZONE]
- **Mission:** [WHAT YOU'RE BUILDING]

## Success Metrics
- [METRIC 1]
- [METRIC 2]
- [METRIC 3]

Part 5: Prompt Caching

Your system prompt, workspace files, and reference materials get sent with every API call. Prompt caching gives you a 90% discount on repeated content.

How It Actually Works

EventCostNotes
Cache write (5-min TTL)1.25x base input priceSlight premium on first send
Cache write (1-hour TTL)2x base input price$6/M for Sonnet vs $3/M base
Cache read (hit)0.1x base input price90% discount
Cache miss1x base input priceFull price, cache expired
🔗
Prompt Caching - Official Claude API Documentation
platform.claude.com/docs
Cache Warming Strategy

Configure your heartbeat interval to be slightly less than the cache TTL. Example: cache TTL = 1 hour, heartbeat every 55 minutes. This keeps your cache warm and prevents expensive cold starts. Source: Apiyi token guide

What to Cache vs Skip

Cache These (Stable)

  • System prompts (SOUL.md, USER.md)
  • Tool documentation (TOOLS.md)
  • Reference materials (docs, specs)
  • Project templates

Skip These (Dynamic)

  • Daily memory files
  • Recent user messages
  • Tool outputs (change per task)
  • Active project notes

File Structure for Maximum Cache Hits

/workspace/
  ├── SOUL.md                     ← Cache (stable)
  ├── USER.md                     ← Cache (stable)
  ├── TOOLS.md                    ← Cache (stable)
  ├── memory/
  │   ├── MEMORY.md               ← Don't cache (updated frequently)
  │   └── 2026-02-03.md           ← Don't cache (daily notes)
  └── projects/
      └── [PROJECT]/REFERENCE.md  ← Cache (stable docs)

Enable in Config

{
  "agents": {
    "defaults": {
      "cache": {
        "enabled": true,
        "ttl": "5m",
        "priority": "high"
      },
      "models": {
        "anthropic/claude-sonnet-4-5": {
          "alias": "sonnet",
          "cache": true
        },
        "anthropic/claude-haiku-4-5": {
          "alias": "haiku",
          "cache": false
        }
      }
    }
  }
}
Batch API: Another 50% Off

The Claude Batch API gives a flat 50% discount on both input and output tokens. Batches typically finish in under 1 hour. You can combine batch pricing with prompt caching for even deeper savings. Ideal for background tasks and bulk operations where immediate response isn't critical.

Real-World Example: 50 Outreach Drafts/Week

MetricWithout CachingWith Caching (Batched)
System prompt5KB x 50 = 250KB/week1 write + 49 cached
System prompt cost$0.75/week$0.016/week
50 drafts$1.20/week$0.60/week (~50% cache hits)
Total$1.95/week ($102/mo)$0.62/week ($32/mo)

Savings: $70/month on a single workflow.

When NOT to Cache


Part 6: Monitoring & Observability

You can't optimize what you can't measure. These tools show where your tokens actually go.

ToolWhat It DoesLink
TapesRecords every API call. Full visibility into prompts, token usage, agent behavior.Built-in
OpenTelemetryBuilt-in OTEL support (v2026.2+). Metrics: openclaw.tokens, openclaw.cost.usd, openclaw.context.tokensGitHub
TokScaleCLI tool for real-time token usage tracking with pricing breakdowns.GitHub
PortkeyRequest logs, cost tracking, automatic failovers, team controls.Docs

Verify Your Setup

# Start a session
openclaw shell

# Check current status
session_status

# You should see:
# - Context size: 2-8KB (not 50KB+)
# - Model: Haiku (not Sonnet)
# - Heartbeat: Ollama/local

Cache Performance Check

# Check cache effectiveness
openclaw shell
session_status

# Look for cache metrics:
# Cache hits: 45/50 (90%)
# Cache tokens used: 225KB (vs 250KB without cache)
# Cost savings: $0.22 this session

Troubleshooting

IssueFix
Context size still largeCheck session initialization rules are in system prompt
Still using Sonnet for everythingVerify ~/.openclaw/openclaw.json syntax and path
Heartbeat errorsMake sure Ollama is running (ollama serve)
Costs haven't droppedUse TokScale or Tapes to see where tokens actually go
Cache hit rate below 80%System prompt is changing too often. Batch updates to maintenance windows.

Quick Reference Checklist

Session Initialization
Added SESSION INITIALIZATION RULE to system prompt
Configured idle window (24h) and archive period (90d)
Model Routing
Updated ~/.openclaw/openclaw.json with Haiku as default
Added MODEL SELECTION RULE to system prompt
Evaluated ultra-cheap models for heartbeats (Gemini Flash-Lite, DeepSeek)
Heartbeat to Ollama
Installed Ollama and pulled llama3.2:3b (or qwen2.5-coder)
Added heartbeat config pointing to Ollama
Verified Ollama is running (ollama serve)
Rate Limits & Workspace
Added RATE LIMITS to system prompt
Created lean SOUL.md and USER.md
Set daily ($5) and monthly ($200) budget caps
Prompt Caching
Enabled caching in config (Sonnet: true, Haiku: false)
Separated stable files from dynamic files
Set heartbeat interval < cache TTL for warm cache
Monitoring
Installed TokScale or configured Tapes
Verified cache hit rate > 80%
Ran session_status to confirm

The Bottom Line

No complex infrastructure changes needed. After testing all of these across multiple setups, the pattern is clear: smart config, clear rules in your system prompt, a free local LLM for heartbeats, and monitoring to know where your tokens go.

OptimizationSavingsEffort
Session Initialization80% less context overhead5 min (system prompt edit)
Model Routing (Haiku default)50-80% on model costs5 min (config change)
Heartbeat to Ollama100% heartbeat costs gone10 min (install + config)
Rate LimitsPrevents $200+ runaway bills5 min (system prompt edit)
Prompt Caching90% on repeated content5 min (config change)
Monitoring (TokScale/Tapes)Visibility into actual spend5 min (install)

Combined result: from $1,500+/month down to $30-50/month.

The intelligence is in the prompt, not the infrastructure.

Further Reading

🔗
Session Management - OpenClaw Docs
docs.openclaw.ai/concepts/session
🔗
Workspace file injection wastes 93.5% of token budget (Issue #9157)
github.com/openclaw/openclaw/issues/9157
🔗
Stop overpaying for OpenClaw: Multi-model routing guide
velvetshark.com
🔗
Prompt Caching - Claude API Documentation
platform.claude.com/docs
🔗
Batch Processing (50% discount) - Claude API Documentation
platform.claude.com/docs
🔗
How to Manage OpenClaw Sessions & Context Pruning
openclawexperts.io