Save 40% on AI coding costs.
Automatically.

Trimli AI compresses every prompt before it hits the model — works with Cursor, Continue, Cline, and Claude Code. No config changes to your AI tools.

40%
avg token savings
6
compression strategies
59/59
accuracy tests passed
0
prompts stored
Three steps. Under a minute.
No SDK changes. No code changes. Just install and point.
1

Install the extension

Search "Trimli AI" in the VS Code Marketplace, or click Install Free above. A local proxy starts on localhost:8765.

2

Point your AI tool

Set your tool's base URL to http://localhost:8765. For Claude Code, enable forward proxy and launch from the VS Code terminal.

3

Watch savings grow

Check the status bar for live savings. Click it to open the dashboard with per-request history, strategy breakdown, and cost estimates.

Works with the tools you use
Any tool that supports a custom base URL or env var works out of the box.
Cursor
Fully supported
Continue
Fully supported
💻
Cline
Fully supported
Claude Code
Via forward proxy
🚀
GitHub Copilot
Partial (LM API)
🌊
Windsurf
Not supported
How much will your team save?
Drag the sliders to see estimated monthly savings based on real compression data.
10
50
$0
estimated savings per month
Start free. Scale when ready.
Free
$0
  • All 6 optimization strategies
  • 200K token savings/day
  • Basic dashboard
  • Works with any supported tool
  • No account required
Get started free
Enterprise
$30/seat/mo
  • Everything in Pro
  • Team shared context pools
  • SSO (Okta / Azure AD)
  • Audit logs + CFO reports
  • On-premise Docker/Helm
Contact sales
Common questions
Does it store my prompts?
No. The proxy optimizes messages in-flight and forgets them immediately. Nothing is logged, cached, or sent to any third party. Your API key is forwarded directly to OpenAI or Anthropic. The optimizer runs entirely on your machine.
Does it affect response quality?
No. We tested across a 59-test accuracy suite comparing optimized vs. unoptimized responses using an LLM judge. The result: 46% average compression with zero quality degradation. The optimizer only removes redundancy — filler words, repeated sentences, stale context — not meaning.
Does it work with my tool?
If your tool supports a custom base URL or reads the OPENAI_BASE_URL / ANTHROPIC_BASE_URL env vars, it works. Cursor, Continue, Cline, and Claude Code are fully supported. GitHub Copilot has partial support via the VS Code Language Model API. Windsurf is not supported (it ignores proxy settings).
Can we self-host it?
Yes. The Enterprise tier includes Docker Compose and Helm chart deployments for teams that need to keep data inside their VPC. The Python compression service, Redis, PostgreSQL, and Nginx are all containerized and ready to deploy.
How is this different from token monitoring tools?
Most alternatives (Tokenlint, Claude Token Monitor, Copilot Token Tracker) are passive — they show you how many tokens you're using but don't reduce them. Trimli AI actively compresses every prompt before it reaches the model. It's the difference between a fuel gauge and a fuel-efficient engine.
What about streaming responses?
Fully supported. The optimizer only modifies the input (your prompts). Streaming responses flow through unchanged.