Data Science Tutorials - Setting up local LLMs using ollama

Getting started in RStudio

Brief overview
Projects
Github
(very brief) Quarto/this blog

Installing local LLMs

Ollama

Ollama is an open-source framework that allows users to run and manage large language models (LLMs) locally on their machines.

Download ollama and install (available for Mac, Linux, and Windows).
- https://ollama.com/
Run ollama for the first time, you will see a llama icon in your menu bar.

Now you can run your first test with:

ollama run llama3

This will download the model llama3 and start an interactive chat session.

When you are done, end chat with \bye

You can download many other models using this terminal command:

Model	Parameters	Size	Download
Llama 3	8B	4.7GB	`ollama run llama3`
Llama 3	70B	40GB	`ollama run llama3:70b`
Phi 3 Mini	3.8B	2.3GB	`ollama run phi3`
Phi 3 Medium	14B	7.9GB	`ollama run phi3:medium`
Gemma	2B	1.4GB	`ollama run gemma:2b`
Gemma	7B	4.8GB	`ollama run gemma:7b`
Mistral	7B	4.1GB	`ollama run mistral`
Moondream 2	1.4B	829MB	`ollama run moondream`
Neural Chat	7B	4.1GB	`ollama run neural-chat`
Starling	7B	4.1GB	`ollama run starling-lm`
Code Llama	7B	3.8GB	`ollama run codellama`
Llama 2 Uncensored	7B	3.8GB	`ollama run llama2-uncensored`
LLaVA	7B	4.5GB	`ollama run llava`
Solar	10.7B	6.1GB	`ollama run solar`

Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

Open WebUI graphical chat app for ollama models

It would be nicer if we had a GUI chat interface to interact with the model and could get code responses in syntax-highlighted code blocks. We can do that and more by installing Open WebUI.

Open WebUI can also be used with non-local models like ChatGPT and Github Copilot, but we will focus on local LLMs for this tutorial.

Open WebUI: https://github.com/open-webui/open-webui

You will need to have docker installed. The easiest method is to download and install Docker Desktop from here: https://www.docker.com/products/docker-desktop/
- Docker Desktop is free and can be used for commercial use for companies up to 250 employees OR $10 million in yearly revenue.
Once Docker Desktop is installed, run it. This will enable you to use the docker command from the terminal.
- There are other ways to install docker, you just need to be able to run docker and docker compose commands.

Install and run Open WebUI with the following command:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

In a web browser go to http://localhost:3000
- Open WebUI will load
- You can install additional models from the menu
  - Install llama3:instruct, we will need it later.
- You can save conversation chains
- You can select multiple models and ask a question and it will select the best suited model to answer your question.
(bonus) For anyone running MacOS 13 or higher, open the link in Safari and from the menu bar, choose File > Add to Dock. This will create a standalone webapp.
- More info here: https://support.apple.com/en-us/104996

Installing Danswer for RAG with ollama models

RAG stands for Retrieval-Augmented Generation. So far our models can answer questions and are great for giving code examples, but the model cannot access a web page or pull information from local documents. With Danswer, we can allow ollama models to do exactly that.

Danswer: https://www.danswer.ai/

Danswer Github: https://github.com/danswer-ai/danswer

This install is a little more involved than the last.

Go to the Danswer website (https://www.danswer.ai/) and click “Self-host for Free.”
- This will take you here: https://docs.danswer.dev/quickstart
- We will follow the instructions on that page (and then make some changes)

Clone the Danswer repo:

git clone https://github.com/danswer-ai/danswer.git

Navigate to danswer/deployment/docker_compose
```
cd danswer/deployment/docker_compose
```
Edit the file danswer/deployment/docker_compose/docker-compose.dev.yml
- Find "3000:80" and change it to "3001:80"
- This will make the webapp available at http://localhost:3001 since we are already using port 3000 for Open WebUI

To connect danswer to ollama, create a file called .env in the danswer/deployment/docker_compose directory:

touch .env

Add the following to the file:

GEN_AI_MODEL_PROVIDER=ollama_chat
# Model of your choice
GEN_AI_MODEL_VERSION=llama3:instruct
# Wherever Ollama is running
# Hint: To point Docker containers to http://localhost:11434, use host.docker.internal instead of localhost
GEN_AI_API_ENDPOINT=http://host.docker.internal:11434

# Let's also make some changes to accommodate the weaker locally hosted LLM
QA_TIMEOUT=120  # Set a longer timeout, running models on CPU can be slow
# Always run search, never skip
DISABLE_LLM_CHOOSE_SEARCH=True
# Don't use LLM for reranking, the prompts aren't properly tuned for these models
DISABLE_LLM_CHUNK_FILTER=True
# Don't try to rephrase the user query, the prompts aren't properly tuned for these models
DISABLE_LLM_QUERY_REPHRASE=True
# Don't use LLM to automatically discover time/source filters
DISABLE_LLM_FILTER_EXTRACTION=True
# Uncomment this one if you find that the model is struggling (slow or distracted by too many docs)
# Use only 1 section from the documents and do not require quotes
QA_PROMPT_OVERRIDE=weak

IMPORTANT: Add an empty return at the end. The final empty line at the end appears important.

Make sure Docker Desktop is running.
To start the application run: (run this from within danswer/deployment/docker_compose)
```
docker compose -f docker-compose.dev.yml -p danswer-stack up -d --pull always --force-recreate
```
- This step may take a little while to download everything and install.
Go to http://localhost:3001 to load the webapp
- You will need to set up the app before you can start using it:
  1. Go to “Search / Chat with Knowledge” > “Get started”
  2. Select the “Custom” tab
  3. Fill in the following values: (none of these are optional even if it say it is)
    - Display Name: ollama
    - Provide Name: ollama
    - API Base: http://host.docker.internal:11434
    - Model Names:
      - llama3:instruct
      - llama3:latest
      - (you can add more here if you have more installed)
    - Default Model: llama3:instruct
  4. Click “Test” to see if all of your settings are correct.
    - If it’s successful, you may be good to go
    - You will know for sure once you test running a chat.
  5. Click “Enable”
  6. Click “Setup your first connector”
    - You can add a document or link to include in your search
    - Danswer also makes it possible to link popular services like Slack and Google docs
    - Try to keep your document/website size small enough. If you try to index all of Google your connector will fail.
If things don’t work at first, stop the docker process completely and run this:
```
docker compose -f docker-compose.dev.yml -p danswer-stack up -d --pull always --force-recreate
```
- Sometimes this is required to run a few times to make sure the .env file is being used for some reason.
- Potentially restart ollama and docker if that doesn’t work
(bonus) For anyone running MacOS 13 or higher, open the link in Safari and from the menu bar, choose File > Add to Dock. This will create a standalone webapp.
- More info here: https://support.apple.com/en-us/104996

Code completion and generation using Continue

I haven’t tested this one yet, but it was on the ollama blog and should work well using the local LLMs installed with ollama. Continue acts as your own AI code assistant within your IDE. It works with VScode and JetBrains for now.

Continue: https://www.continue.dev/

VScode extension: https://marketplace.visualstudio.com/items?itemName=Continue.continue

ollama blog post: https://ollama.com/blog/continue-code-assistant