OpenClaw Money-Saving Strategy: Saving Two Thousand a Month - What Am I Doing Right?

By: blockbeats|2026/03/10 10:00:01

Original Article Title: Why My OpenClaw Sessions Burned 21.5M Tokens in a Day (And What Actually Fixed It)
Original Article Author: MOSHIII
Translation: Peggy, BlockBeats

Editor's Note: In the current rapid adoption of Agent applications, many teams have encountered a seemingly anomalous phenomenon: while the system appears to be running smoothly, the token cost continues to rise unnoticed. This article reveals that the reason for cost explosion in a real OpenClaw workload often does not stem from user input or model output but from the overlooked cached prefix replay. The model repeatedly reads a large historical context in each call, leading to significant token consumption.

The article, using specific session data, demonstrates how large intermediate artifacts such as tool outputs, browser snapshots, JSON logs, etc., are continuously written into the historical context and repetitively read in the agent loop.

Through this case study, the author presents a clear optimization approach: from context structure design, tool output management to compaction mechanism configuration. For developers building Agent systems, this is not only a technical troubleshooting record but also a practical money-saving strategy.

Below is the original article:

I analyzed a real OpenClaw workload and discovered a pattern that I believe many Agent users will recognize:

The token usage looks "active."

The replies appear normal.

But the token consumption suddenly explodes.

Here is the breakdown of the structure, root cause, and a practical fix path for this analysis.

TL;DR

The biggest cost driver is not overly long user messages. It is the massive cached prefix being repeatedly replayed.

From the session data:

Total tokens: 21,543,714

cacheRead: 17,105,970 (79.40%)

input: 4,345,264 (20.17%)

output: 92,480 (0.43%)

In other words, the majority of the cost of inference is not in processing new user intent, but in repeatedly reading a massive historical context.

The "Wait, Why?" Moment

I originally thought high token usage came from: very long user prompts, extensive output generation, or expensive tool invocations.

But the predominant pattern is:

input: hundreds to thousands of tokens

cacheRead: each call 170k to 180k tokens

In other words, the model is rereading the same massive stable prefix every round.

Data Scope

I analyzed data at two levels:

1. Runtime logs
2. Session transcripts

It's worth noting that:

Runtime logs are primarily used to observe behavioral signals (e.g., restarts, errors, configuration issues)

Precise token counts come from the usage field in session JSONL

Scripts used:

scripts/session_token_breakdown.py

scripts/session_duplicate_waste_analysis.py

Analysis files generated:

tmp/session_token_stats_v2.txt

tmp/session_token_stats_v2.json

tmp/session_duplicate_waste.txt

tmp/session_duplicate_waste.json

tmp/session_duplicate_waste.png

-- Price

Where is the Token Actually Being Consumed?

1) Session Centralization

There is one session that consumes significantly more than others:

570587c3-dc42-47e4-9dd4-985c2a50af86: 19,204,645 tokens

This is followed by a sharp drop-off:

ef42abbb-d8a1-48d8-9924-2f869dea6d4a: 1,505,038

ea880b13-f97f-4d45-ba8c-a236cf6f2bb5: 649,584

2) Behavior Centralization

The tokens mainly come from:

toolUse: 16,372,294

stop: 5,171,420

The issue is primarily with tool call chain loops rather than regular chat.

3) Time Centralization

The token peaks are not random but rather concentrated in a few time slots:

2026-03-08 16:00: 4,105,105

2026-03-08 09:00: 4,036,070

2026-03-08 07:00: 2,793,648

What Exactly Is in the Massive Cache Prefix?

It's not the conversation content but mainly large intermediate artifacts:

Massive toolResult data blocks

Lengthy reasoning/thinking traces

Large JSON snapshots

File lists

Browser fetch data

Sub-Agent conversation logs

In the largest session, the character count is approximately:

toolResult:text: 366,469 characters

assistant:thinking: 331,494 characters

assistant:toolCall: 53,039 characters

Once these contents are retained in the historical context, each subsequent invocation may retrieve them again via a cache prefix.

Specific Example (from session file)

A significantly large context block repeatedly appears at the following locations:

sessions/570587c3-dc42-47e4-9dd4-985c2a50af86.jsonl:70

Large Gateway JSON Log (approx. 37,000 characters)

sessions/570587c3-dc42-47e4-9dd4-985c2a50af86.jsonl:134

Browser Snapshot + Security Encapsulation (approx. 29,000 characters)

sessions/570587c3-dc42-47e4-9dd4-985c2a50af86.jsonl:219

Large File List Output (approx. 41,000 characters)

sessions/570587c3-dc42-47e4-9dd4-985c2a50af86.jsonl:311

session/status Status Snapshot + Large Prompt Structure (approx. 30,000 characters)

「Duplicate Content Waste」 vs 「Cache Replay Burden」

I also measured the duplicate content ratio within a single invocation:

Approximate duplication ratio: 1.72%

It does exist but is not the primary issue.

The real problem is: the absolute volume of the cache prefix is too large

Structure: Massive historical context, re-read per-round invocation, with only a small amount of new input stacked on top.

Therefore, the optimization focus is not on deduplication, but on context structure design.

Why is Agent Loop particularly prone to this issue?

Three mechanisms overlapping:

1. A large amount of tool output is written to historical context

2. Tool looping generates a large number of short interval calls

3. Minimal prefix changes → cache is re-read every time

If context compaction is not stably triggered, the issue will quickly escalate.

Most Critical Remediation Strategies (by impact)

P0—Avoid stuffing massive tool output into long-lived context

For oversized tool output:

· Keep summary + reference path / ID

· Write original payload to a file artifact

· Do not retain the full original text in chat history

Priority to limit these categories:

· Large JSON

· Long directory lists

· Browser full snapshots

· Sub-Agent full transcripts

P1—Ensure compaction mechanism truly takes effect

In this dataset, configurational compatibility issues have repeatedly arisen: the compaction key is invalid

This will silently disable optimization mechanisms.

Correct approach: use only version-compatible configurations

Then verify:

openclaw doctor --fix

and check startup logs to confirm compaction acceptance.

P1—Reduce reasoning text persistence

Avoid long reasoning texts being replayed repeatedly

In a production environment: save brief summaries instead of complete reasoning

P2—Improve prompt caching design

Goal is not to maximize cacheRead. Goal is to use cache on compact, stable, high-value prefixes.

Recommendations:

· Put stable rules into system prompt

· Avoid putting unstable data under stable prefixes

· Avoid injecting large amounts of debug data each round

Implementation Stop-Loss Plan (if I were to tackle it tomorrow)

1. Identify the session with the highest cacheRead percentage
2. Run /compact on runaway sessions

3. Add truncation + artifacting to tool outputs

4. Rerun token stats after each modification

Focus on tracking four KPIs:

cacheRead / totalTokens

toolUse avgTotal/call

Calls with>=100k tokens

Maximum session percentage

Success Signals

If the optimization is successful, you should see:

A noticeable reduction in calls with 100k+ tokens

A decrease in cacheRead percentage

A decrease in toolUse call weight

A decrease in the dominance of individual sessions

If these metrics do not change, it means your contextual policies are still too loose.

Reproducibility Experiment Command

python3 scripts/session_token_breakdown.py 'sessions' \
--include-deleted \
--top 20 \
--outlier-threshold 120000 \
--json-out tmp/session_token_stats_v2.json \
> tmp/session_token_stats_v2.txt

python3 scripts/session_duplicate_waste_analysis.py 'sessions' \
--include-deleted \
--top 20 \
--png-out tmp/session_duplicate_waste.png \
--json-out tmp/session_duplicate_waste.json \
> tmp/session_duplicate_waste.txt

Conclusion

If your Agent system appears to be working fine but costs are continually rising, you may want to first check for one issue: Are you paying for new inferences or for large-scale replay of old contexts?

In my case, the majority of costs actually came from context replays.

Once you realize this, the solution becomes clear: Strictly control the data entering long-lived contexts.

[Original Article Link]

OUSD was jointly launched by more than 140 giants, causing Circle's stock price to plummet in a single day. Circle's CEO personally wrote a response, clarifying USDC's moat from three aspects: network effects, liquidity, and regulation, and dismantling OUSD's three selling points of "free redemption...

Argentina vs Cape Verde: When a Record-Breaking Legend Meets an Unbreakable Underdog

WEEX exclusive pre-match analysis of Argentina vs Cape Verde, exploring Messi-led Argentina’s dominance and Cape Verde’s historic defensive breakout, with a breakdown of volatility, structure, and match dynamics.

How does Gate redo "buying and selling stocks" from the cryptocurrency world to the stock market?

The competition logic of exchanges has changed.

Former ByteDance employee's account: How I started with two Pinduoduo hard drives and made six times the profit with Seagate to achieve financial freedom?

A programmer from a big tech company bought hard drives on Pinduoduo and, following clues, managed to accurately capture the sixfold rising stock Seagate using the "finding daily anomalies + 13F institutional verification" framework, making a wild profit of $400,000 and achieving financial freedom.

Visa and Mastercard join 140 giants to launch a new stablecoin, but the impact on the market landscape may still be limited

As an important milestone event in the stablecoin landscape, OUSD is likely to change the existing stablecoin landscape and significantly increase the adoption rate of stablecoins in the global financial system.

WEEX Launches Depth Chart for Spot Trading

WEEX Spot now supports Depth Chart, helping users visualize buy and sell orders, spot liquidity walls, and understand market depth more clearly before placing trades.

MiCA reshuffle begins, Binance temporarily bids farewell to the EU

What Binance leaves behind is not scattered retail investors, but a whole batch of high-value users who are forced to liquidate and have almost nowhere to go.

Raising interest rates to protect STRC and selling coins to maintain credit, this time the strategy has chosen the two most expensive paths

The rebound in BTC prices can make all problems simple.

Morning Report | Samsung announces a 265.5 trillion won investment plan, focusing on semiconductor and AI computing power data centers; Vitalik publishes an article detailing the entire technology tree behind the confusion protocol (iO) mainline

Overview of Important Market Events on June 29

In the era of AI, what is left of Bitcoin?

AI can generate a fake image, create a fake video, and even forge a person's voice. But it cannot make the entire Bitcoin network acknowledge a non-existent transaction out of thin air.

NeoSoul announced plans to integrate with the OKX Agentic Wallet, promoting AI agents' participation in the on-chain economy

After the integration is complete, the AI entity will be able to manage on-chain assets, pay service fees, and perform related on-chain operations.

Why Is Bitcoin Lagging Stocks in 2026? AI Stocks, ETF Outflows, and the Nasdaq Rally Explained

Stocks are hitting record highs while Bitcoin continues to lag. Discover why AI stocks are attracting institutional capital and what it means for crypto traders.

What you bought on CEX is really not US stocks: Analyzing the 94% liquidation monopoly and the evaporation of equity under a five-layer pipeline

Peeling back its smooth trading interface to examine the underlying legal relationships and settlement processes, you will find that this is far from a simple "RWA asset revolution," but rather a complex game of interests involving spot pricing, rights ownership, and the monopoly of underlying custo...

In such a crowded cross-border payment arena, where is the next stop for the future?

Only by stepping into the mud can one have the chance to touch gold.

Why Is Bitcoin Down in 2026? What We Can Learn From 2022

Why is Bitcoin down in 2026? Bitcoin has just recorded its worst first half since 2022, with back-to-back quarterly losses, record ETF outflows, and extreme fear. Here's what history says, how 2026 differs from the last bear market, and the three signals traders should wat

The large models in the United States are moving towards closure in the name of security

The government successfully inserted itself as an approver between commercial AI models and their users for the first time.

From the white-haired stock god to the billionaire fund mogul, the smart people shorting Nvidia are all getting rich using the same framework

Give up on heavily investing in Nvidia's "nine major bottlenecks"! This article analyzes the underlying logic behind top AI investors making billions: physical infrastructure such as electricity, HBM, and optical interconnects are the true keys to wealth in AI hardware.

Morning Report | CoinEx becomes a key hub for Iran to evade sanctions, involving over $3.8 billion in funds; Kalshi seeks a new round of financing, with a valuation potentially rising to $40 billion

Overview of Important Market Events on June 25

Former ByteDance employee's account: How I started with two Pinduoduo hard drives and made six times the profit with Seagate to achieve financial freedom?

Visa and Mastercard join 140 giants to launch a new stablecoin, but the impact on the market landscape may still be limited

WEEX Launches Depth Chart for Spot Trading

WEEX Spot now supports Depth Chart, helping users visualize buy and sell orders, spot liquidity walls, and understand market depth more clearly before placing trades.