Intelligent Context Condensing
How It Works
As your conversation with Saarthi grows, it might approach the context window limit of the underlying AI model. When this happens, older messages would typically be removed to make space. Intelligent Context Condensing aims to prevent this abrupt loss by:
- Summarizing: Using an AI model, it condenses earlier parts of the conversation.
- Retaining Essentials: The goal is to reduce the overall token count while keeping the key information from the summarized messages.
- Maintaining Flow: This allows the AI to have a more coherent understanding of the entire conversation, even very long ones.
Important Considerations:
- Summarization Impact: While original messages are preserved if you use Checkpoints to rewind, the summarized version is what's used in ongoing LLM calls to keep the context manageable.
- Cost: The AI call to perform the summarization incurs a cost. This cost is included in the context condensing metrics displayed in the UI.
Configuration
Intelligent Context Condensing is enabled by default and offers several configuration options:
- Open Saarthi settings ( icon in the top right corner of the Saarthi panel).
- Navigate to the "Context" settings section.
- Configure the available options:
- Automatically trigger intelligent context condensing: Enabled by default, this controls whether condensing happens automatically
- Threshold to trigger intelligent context condensing: A percentage slider (default 100%) that determines when condensing activates based on context window usage
- API Configuration for Context Condensing: Choose which API configuration to use for condensing operations (defaults to your current active configuration)
- Custom Context Condensing Prompt: Customize the system prompt used for context condensing operations
 
 Intelligent Context Condensing configuration options: automatic triggering toggle, threshold slider, API configuration selection, and custom prompt customization.
Intelligent Context Condensing configuration options: automatic triggering toggle, threshold slider, API configuration selection, and custom prompt customization.
Controlling and Understanding Context Condensing
Saarthi provides several ways to control and understand the Intelligent Context Condensing feature:
Controlling Context Condensing
- Automatic Threshold: The threshold slider in "Context" settings allows you to define a percentage (e.g., 80%) of context window usage. Saarthi will attempt to condense the context automatically when the conversation reaches this level of capacity.
- API Configuration: Select which API configuration to use for context condensing operations. This allows you to use a different provider or model specifically for condensing if desired.
- Custom Prompts: Modify the system prompt used for condensing to better suit your workflow or to emphasize certain aspects of conversation summarization.
- Manual Trigger: A Condense Context button is available at the top of the task, positioned to the right of the context bar. This allows you to initiate the context condensing process at any time.
 The Manual Condense Context button (highlighted with a yellow arrow) is easily accessible for manual control.
The Manual Condense Context button (highlighted with a yellow arrow) is easily accessible for manual control.
Understanding Context Condensing Activity
- Context Condensing Metrics: When context condensing occurs, Saarthi displays:
- The context token counts before and after context condensing.
- The cost associated with the context condensing AI call.
- An expandable summary detailing what was condensed (this information is part of the ContextCondenseRowcomponent visible in the chat history).
 
 
After context condensing, a message indicates the context has been condensed, showing token changes and cost.
- Visual Indicators:
- A progress indicator ("Condensing context...") is shown in the chat interface while context condensing is active.
 
The "Condensing context..." indicator appears in the chat during the process.
- The task header also displays the current context condensing status.
- The ContextWindowProgressbar offers a visual representation of token distribution, including current usage, space reserved for the AI's output, available space, and raw token numbers.
- Interface Clarity: The "Condense Context" button includes a tooltip explaining its function, available in all supported languages.
Technical Implementation
Token Counting
Saarthi uses a sophisticated token counting system that:
- Employs native token counting endpoints when available (e.g., Anthropic's API)
- Falls back to tiktoken estimation if API calls fail
- Provides accurate counting for different content types:
- Text content: Uses word-based estimation with punctuation and newline overhead
- Image content: Uses a conservative estimate of 300 tokens per image
- System prompts: Includes additional overhead for structural elements
 
Context Window Management
- By default, 30% of the context window is reserved (20% for model output and 10% as a safety buffer), leaving 70% available for conversation history.
- This reservation can be overridden by model-specific settings
- The system automatically calculates available space while maintaining this reservation