Using Ollama With Saarthi
Saarthi supports running models locally using Ollama. This provides privacy, offline access, and potentially lower costs, but requires more setup and a powerful computer.
Website: https://ollama.com/
Setting up Ollama
- 
Download and Install Ollama: Download the Ollama installer for your operating system from the Ollama website. Follow the installation instructions. Make sure Ollama is running ollama serve
- 
Download a Model: Ollama supports many different models. You can find a list of available models on the Ollama website. Some recommended models for coding tasks include: - codellama:7b-code(good starting point, smaller)
- codellama:13b-code(better quality, larger)
- codellama:34b-code(even better quality, very large)
- qwen2.5-coder:32b
- mistralai/Mistral-7B-Instruct-v0.1(good general-purpose model)
- deepseek-coder:6.7b-base(good for coding tasks)
- llama3:8b-instruct-q5_1(good for general tasks)
 To download a model, open your terminal and run: ollama pull <model_name>For example: ollama pull qwen2.5-coder:32b
- 
Configure the Model: by default, Ollama uses a context window size of 2048 tokens, which is too small for Saarthi requests. You need to have at least 12k to get decent results, ideally - 32k. To configure a model, you actually need to set its parameters and save a copy of it. Load the model (we will use qwen2.5-coder:32bas an example):ollama run qwen2.5-coder:32bChange context size parameter: /set parameter num_ctx 32768Save the model with a new name: /save your_model_name
- 
Configure Saarthi: - Open the Saarthi sidebar icon.
 - Click the settings gear icon ().
- Select "ollama" as the API Provider.
- Enter the Model name from the previous step (e.g., your_model_name).
- (Optional) You can configure the base URL if you're running Ollama on a different machine. The default is http://localhost:11434.
- (Optional) Configure Model context size in Advanced settings, so Saarthi knows how to manage its sliding window.
 
Tips and Notes
- Resource Requirements: Running large language models locally can be resource-intensive. Make sure your computer meets the minimum requirements for the model you choose.
- Model Selection: Experiment with different models to find the one that best suits your needs.
- Offline Use: Once you've downloaded a model, you can use Saarthi offline with that model.
- Token Tracking: Saarthi tracks token usage for models run via Ollama, helping you monitor consumption.
- Ollama Documentation: Refer to the Ollama documentation for more information on installing, configuring, and using Ollama.