Now saving tokens for developers worldwide

Cut your AI
coding costs by 40%

Trimli compresses every prompt before it reaches the model. Same quality responses, dramatically lower bills. Works silently with your existing tools.

Install for VS Code Learn more

trimli proxy running on :8765

$curl -X POST localhost:8765/v1/chat/completions ...

input tokens: 28,400 → 16,900

compression: 40.5% saved

strategies: whitespace, dedup, intent-distill, reference-sub

cost delta: -$0.029 this request

response: streamed unchanged (238 tokens, 1.2s)

How it works

Install. Point. Save.

No SDK changes. No prompt modifications. Three steps, under a minute.

Install the extension

Search "Trimli AI" in VS Code Marketplace. A local proxy starts automatically on localhost:8765.

Point your AI tool

Set your tool's base URL to localhost:8765. For Claude Code, just enable forward proxy mode.

Watch savings grow

The status bar shows live savings. Click to open the full dashboard with per-request history and cost breakdown.

Under the hood

6 compression strategies

Each request runs through a pipeline of strategies, cheapest first. No LLM calls needed for compression.

␡

Whitespace normalize

Lossless cleanup of extra spaces, blank lines, and indentation noise.

3-8% savings

🔁

Deduplicate

Removes repeated sentences and paragraphs across conversation turns.

5-20% savings

🎯

Intent distill

Strips filler words from user queries while preserving the core ask.

10-30% savings

🔗

Reference substitute

Aliases repeated long strings (file paths, URLs, class names) with short refs.

10-25% savings

📜

History summarize

Compresses older conversation turns into decision summaries. No LLM needed.

30-50% savings

✂

Context prune

Drops low-relevance messages when context window is nearly full.

20-40% savings

Pricing

Start free. Scale when ready.

Same optimization quality on every tier. Pro removes the daily cap.

Free

Free forever

All 6 optimization strategies
LLMLingua-2 compression
200K token savings / day
Basic usage dashboard
No account required

Get started

Pro

$10/mo

Pays for itself in under a day

Everything in Free
Unlimited token savings
Full analytics + 30-day charts
Request history export
Priority support

Start Pro

Enterprise

$30/seat/mo

For teams of 5+

Everything in Pro
Team shared context pools
SSO (Okta / Azure AD)
Audit logs + CFO reports
On-premise Docker / Helm

Contact sales

FAQ

Common questions

Does Trimli store my prompts or API keys?

No. The proxy optimizes messages in-flight and forgets them immediately. Nothing is logged, cached, or sent to any third party. Your API key is forwarded directly to the model provider. The optimizer runs entirely on your machine.

Does compression affect response quality?

No. We ran a 59-test accuracy suite comparing optimized vs. unoptimized responses using an LLM judge. Result: 46% average compression with zero quality degradation. The optimizer removes redundancy — filler, repeated context, stale history — not meaning.

How many tokens do developers actually use per day?

In agentic workflows (Claude Code, Cline), each request sends ~20,000 input tokens including the system prompt, file context, and conversation history. A developer making 75-150 requests/day consumes 1.5M-3M input tokens daily — that's $4.50-$45/day depending on model.

Can my team self-host it?

Yes. Enterprise includes Docker Compose and Helm chart deployments. The Python compression service, Redis, PostgreSQL, and Nginx are all containerized and ready to deploy in your VPC. Zero data leaves your network.

How is this different from token monitoring tools?

Most alternatives (Tokenlint, Claude Token Monitor, Copilot Token Tracker) show you how many tokens you use. They don't reduce them. Trimli actively compresses every prompt before it reaches the model. It's the difference between a fuel gauge and a fuel-efficient engine.

Does it work with streaming responses?

Yes. Trimli only modifies the input (your prompts). Streaming responses pass through completely unchanged.

Cut your AIcoding costs by 40%