by B. Mert Köseoğlu

context-mode

The other half of the context problem.

users
npm
marketplace
12
platforms

Used across teams at

MicrosoftGoogleMetaByteDanceRed HatGitHubStripeDatadogIBMNVIDIASupabaseAmazonCanvaNotionSalesforceHasuraFramerCursor

The context window problem

Here's the thing about Claude Code, Cursor, Copilot, Codex, Gemini CLI. Under the hood, they all work the same way. The LLM has no memory. Every single turn, the full conversation gets re-sent to the API. Every file your agent reads, every command output, every grep result. It piles up, and it all ships back to the model on every turn that follows.

Run gh issue list once. 59 KB of JSON comes back. About 15,000 tokens. Next turn, those 15,000 tokens get sent again. Turn after that, again. Fifty turns in, that one command has cost you 750,000 input tokens.

Now do that 20 times in a session. Average output per call: 30 KB. That's 600 KB sitting in your conversation, re-sent 50 times over. 30 MB of input tokens on tool output alone.

That's where your limit goes.

Eventually the context window fills up and the agent compacts. It summarizes the conversation. Throws away the originals. The agent forgets what it read, what it decided, what it was in the middle of building. You start over. You re-explain your codebase to a model that just lost its memory.

50%
accuracy drop at 32K tokens. 11 of 12 models tested. Your agent makes worse decisions as context fills up.
NoLiMa, 2025
15 min
spent re-establishing context after each reset. Developers re-explain their codebase from scratch, every time.
$60K
/year for a 50-seat team on an agent that forgets everything every 20 minutes. $100/seat/month, burned.
56 KB
per snapshot
Playwright
59 KB
per query
GitHub Issues
45 KB
per analysis
Access Logs
30 MB
re-sent / session
Total Cost

Intercept, sandbox, index

context-mode is an MCP server that sits between your agent and its tools. When a tool call would dump large output into the conversation, context-mode intercepts it and runs it in a sandboxed subprocess instead. The raw data never touches context. It goes into a local FTS5 database with BM25 ranking. The agent searches it when it needs to. Only a short summary enters the conversation.

The plugin hooks into five points in the session lifecycle:

01PreToolUseRoute tool calls. Block curl/wget, redirect large output to sandbox.
02PostToolUseCapture events. File ops, git, errors, decisions → SessionDB.
03SessionStartRestore state. Inject resume snapshot, reload indexed knowledge.
04PreCompactPreserve state. Build snapshot before context wipe.
05UserPromptSubmitCapture intent. Track decisions, corrections, session mode.

Every tool call passes through a routing engine. Calls to curl, wget, or inline HTTP get blocked and rerouted to the sandbox. Build tools, large file reads, web fetches, same thing. Small commands that won't cause problems pass through untouched.


Measured, not estimated

Real numbers from real sessions. Not projections.

99.5%
56.2 KB → 299 B
Playwright
98%
58.9 KB → 1.1 KB
GitHub Issues
99.7%
45.1 KB → 155 B
Access Logs
98%
315 KB → 5.4 KB
Full Session

Without context-mode, those 20 tool calls put 600 KB into the conversation. Over 50 turns, 30 MB re-sent. With it, the same calls produce 20 KB. Over 50 turns, 1 MB. Same work. Same answers. 30 times fewer tokens.


Think in Code

That handles the output side. Tool results stay in the sandbox. But there was still a problem with input. When the agent wants to analyze 47 files, it calls Read() on each one. Every file enters context. Every file stays there.

v1.0.64 made this a rule, not a suggestion. We call it Think in Code. It's mandatory across all 12 platform configs. The idea fits in one sentence: if you need to analyze, count, filter, or process data, write code that does it. Stop pulling raw data into context to process mentally.

// Before: 47 files × Read() = 700 KB in context // After: 1 ctx_execute() = 3.6 KB in context ctx_execute("javascript", ` const fs = require('fs'); const files = fs.readdirSync('src').filter(f => f.endsWith('.ts')); files.forEach(f => { const lines = fs.readFileSync('src/'+f,'utf8').split('\\n').length; console.log(f + ': ' + lines + ' lines'); }); `); // 47 files analyzed. 15,314 LoC processed. // Context used: 3.6 KB. Reduction: 200×.

The LLM writes a script. The script runs in a sandbox. Only stdout enters context. Your CPU does the work for free. Tokens cost money.


Your AI never starts from scratch

Most tools lose everything when context compacts. context-mode doesn't. Every file operation, every decision, every error and its fix gets written to a local SQLite database. Next session, it loads back in.

PostToolUse captures events across 15 categories as they happen. PreCompact builds a reference-based snapshot right before the wipe. SessionStart restores it on the next turn. The agent picks up mid-thought.

Your prompts
Files tracked
Project rules
Your decisions
Git operations
Errors & fixes
Environment
Session mode
Tool patterns

26 events carry over through each reset. Nothing lost. You don't explain your codebase twice.


12 platforms, one plugin

Works on Claude Code, Cursor, Copilot, Codex, Gemini CLI, and seven more. Platforms with hook support get automatic routing. The rest use instruction-based rules. Same plugin everywhere.

Claude CodeCursorCodex CLIVS Code CopilotGemini CLIKiroOpenCodeKiloCodeZedOpenClawPiAntigravity

Privacy-first. Your code never leaves your machine.

context-mode runs entirely on your local machine. No cloud. No telemetry. No accounts. No tracking. No analytics. Your code, your prompts, your file paths, your session data stays on your disk. Nothing is sent anywhere, ever. The plugin is source-available under the Elastic License 2.0. You can read every line, audit every hook, verify every claim.

Dangerous command blocking
curl, wget, inline HTTP, rm -rf blocked at PreToolUse. Redirected to sandbox or denied.
Environment variable denylist
60+ env vars stripped from sandbox execution. API keys, tokens, secrets never exposed to subprocess.
Zero network access
No data sent anywhere. No analytics. No heartbeat. No phone-home. Fully air-gapped operation.
Source-available
Elastic License 2.0. Full source on GitHub. Audit the code yourself. No compiled blobs.

Two commands, two minutes

Claude Code recommended
/plugin marketplace add mksglu/context-mode
All platforms
npm install -g context-mode

Open source · Elastic License 2.0 · Node.js 18+ · Works with Bun


For teams

context-mode for Enterprise

The open source plugin handles individual developers. For teams that need visibility, continuity across devices, and compliance, we're building a managed layer.

Context as a Service
API

Your AI tools produce session data that no other tool can see. We expose it as an API. CI/CD pipelines query it before deploy. Code review tools show AI decision trails. Compliance dashboards pull audit logs.

The data layer for AI-assisted development. Twilio model: plugin free, API paid.

For your EM

"Which of my 10 engineers are struggling with AI tools? Who needs TDD coaching? Where are we wasting cycles?"

56 metrics, 8 personas, 144 actionable insights.

For your CISO

"What did the AI do? Which files did it access? Did it run anything dangerous? Can we prove it for SOC2?"

AI audit trail. Automated compliance reports.

For your DevOps

"AI edited 47 files in this PR, 3 have no tests, 4 edit-run-fix cycles on auth.ts. Should we merge?"

GitHub Actions. AI risk scoring. Deploy gates.

For your developers

"I was debugging auth.ts yesterday on my laptop. Now I'm on my desktop. Where was I?"

Cloud sessions. Any device, any time. Full history.

We're onboarding design partners

5 spots for teams that want to shape the product. Free access during the design phase.

Book a 15-min call → bm.ksglu@gmail.com