Does it store my prompts?
No. The proxy optimizes messages in-flight and forgets them immediately. Nothing is logged, cached, or sent to any third party. Your API key is forwarded directly to OpenAI or Anthropic. The optimizer runs entirely on your machine.
Does it affect response quality?
No. We tested across a 59-test accuracy suite comparing optimized vs. unoptimized responses using an LLM judge. The result: 46% average compression with zero quality degradation. The optimizer only removes redundancy — filler words, repeated sentences, stale context — not meaning.
Does it work with my tool?
If your tool supports a custom base URL or reads the OPENAI_BASE_URL / ANTHROPIC_BASE_URL env vars, it works. Cursor, Continue, Cline, and Claude Code are fully supported. GitHub Copilot has partial support via the VS Code Language Model API. Windsurf is not supported (it ignores proxy settings).
Can we self-host it?
Yes. The Enterprise tier includes Docker Compose and Helm chart deployments for teams that need to keep data inside their VPC. The Python compression service, Redis, PostgreSQL, and Nginx are all containerized and ready to deploy.
How is this different from token monitoring tools?
Most alternatives (Tokenlint, Claude Token Monitor, Copilot Token Tracker) are passive — they show you how many tokens you're using but don't reduce them. Trimli AI actively compresses every prompt before it reaches the model. It's the difference between a fuel gauge and a fuel-efficient engine.
What about streaming responses?
Fully supported. The optimizer only modifies the input (your prompts). Streaming responses flow through unchanged.