Getting started in RStudio
- Brief overview
- Projects
- Github
- (very brief) Quarto/this blog
Installing local LLMs
Ollama
Ollama is an open-source framework that allows users to run and manage large language models (LLMs) locally on their machines.
Download ollama and install (available for Mac, Linux, and Windows).
Run ollama for the first time, you will see a llama icon in your menu bar.
Now you can run your first test with:
ollama run llama3
This will download the model
llama3
and start an interactive chat session.When you are done, end chat with
\bye
You can download many other models using this terminal command:
Model Parameters Size Download Llama 3 8B 4.7GB ollama run llama3
Llama 3 70B 40GB ollama run llama3:70b
Phi 3 Mini 3.8B 2.3GB ollama run phi3
Phi 3 Medium 14B 7.9GB ollama run phi3:medium
Gemma 2B 1.4GB ollama run gemma:2b
Gemma 7B 4.8GB ollama run gemma:7b
Mistral 7B 4.1GB ollama run mistral
Moondream 2 1.4B 829MB ollama run moondream
Neural Chat 7B 4.1GB ollama run neural-chat
Starling 7B 4.1GB ollama run starling-lm
Code Llama 7B 3.8GB ollama run codellama
Llama 2 Uncensored 7B 3.8GB ollama run llama2-uncensored
LLaVA 7B 4.5GB ollama run llava
Solar 10.7B 6.1GB ollama run solar
Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
Open WebUI graphical chat app for ollama models
It would be nicer if we had a GUI chat interface to interact with the model and could get code responses in syntax-highlighted code blocks. We can do that and more by installing Open WebUI.
Open WebUI can also be used with non-local models like ChatGPT and Github Copilot, but we will focus on local LLMs for this tutorial.
Open WebUI: https://github.com/open-webui/open-webui
You will need to have docker installed. The easiest method is to download and install Docker Desktop from here: https://www.docker.com/products/docker-desktop/
- Docker Desktop is free and can be used for commercial use for companies up to 250 employees OR $10 million in yearly revenue.
Once Docker Desktop is installed, run it. This will enable you to use the
docker
command from the terminal.- There are other ways to install docker, you just need to be able to run
docker
anddocker compose
commands.
- There are other ways to install docker, you just need to be able to run
Install and run Open WebUI with the following command:
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
In a web browser go to http://localhost:3000
Open WebUI will load
You can install additional models from the menu
- Install
llama3:instruct
, we will need it later.
- Install
You can save conversation chains
You can select multiple models and ask a question and it will select the best suited model to answer your question.
(bonus) For anyone running MacOS 13 or higher, open the link in Safari and from the menu bar, choose File > Add to Dock. This will create a standalone webapp.
- More info here: https://support.apple.com/en-us/104996
Installing Danswer for RAG with ollama models
RAG stands for Retrieval-Augmented Generation. So far our models can answer questions and are great for giving code examples, but the model cannot access a web page or pull information from local documents. With Danswer, we can allow ollama models to do exactly that.
Danswer: https://www.danswer.ai/
Danswer Github: https://github.com/danswer-ai/danswer
This install is a little more involved than the last.
Go to the Danswer website (https://www.danswer.ai/) and click “Self-host for Free.”
This will take you here: https://docs.danswer.dev/quickstart
We will follow the instructions on that page (and then make some changes)
Clone the Danswer repo:
git clone https://github.com/danswer-ai/danswer.git
Navigate to danswer/deployment/docker_compose
cd danswer/deployment/docker_compose
Edit the file
danswer/deployment/docker_compose/docker-compose.dev.yml
Find
"3000:80"
and change it to"3001:80"
This will make the webapp available at http://localhost:3001 since we are already using port
3000
for Open WebUI
To connect danswer to ollama, create a file called
.env
in thedanswer/deployment/docker_compose
directory:touch .env
- Add the following to the file:
GEN_AI_MODEL_PROVIDER=ollama_chat # Model of your choice GEN_AI_MODEL_VERSION=llama3:instruct # Wherever Ollama is running # Hint: To point Docker containers to http://localhost:11434, use host.docker.internal instead of localhost GEN_AI_API_ENDPOINT=http://host.docker.internal:11434 # Let's also make some changes to accommodate the weaker locally hosted LLM QA_TIMEOUT=120 # Set a longer timeout, running models on CPU can be slow # Always run search, never skip DISABLE_LLM_CHOOSE_SEARCH=True # Don't use LLM for reranking, the prompts aren't properly tuned for these models DISABLE_LLM_CHUNK_FILTER=True # Don't try to rephrase the user query, the prompts aren't properly tuned for these models DISABLE_LLM_QUERY_REPHRASE=True # Don't use LLM to automatically discover time/source filters DISABLE_LLM_FILTER_EXTRACTION=True # Uncomment this one if you find that the model is struggling (slow or distracted by too many docs) # Use only 1 section from the documents and do not require quotes QA_PROMPT_OVERRIDE=weak
- IMPORTANT: Add an empty return at the end. The final empty line at the end appears important.
Make sure Docker Desktop is running.
To start the application run: (run this from within
danswer/deployment/docker_compose
)docker compose -f docker-compose.dev.yml -p danswer-stack up -d --pull always --force-recreate
- This step may take a little while to download everything and install.
Go to http://localhost:3001 to load the webapp
You will need to set up the app before you can start using it:
- Go to “Search / Chat with Knowledge” > “Get started”
- Select the “Custom” tab
- Fill in the following values: (none of these are optional even if it say it is)
Display Name: ollama
Provide Name: ollama
API Base: http://host.docker.internal:11434
Model Names:
llama3:instruct
llama3:latest
(you can add more here if you have more installed)
Default Model: llama3:instruct
- Click “Test” to see if all of your settings are correct.
If it’s successful, you may be good to go
You will know for sure once you test running a chat.
- Click “Enable”
- Click “Setup your first connector”
You can add a document or link to include in your search
Danswer also makes it possible to link popular services like Slack and Google docs
Try to keep your document/website size small enough. If you try to index all of Google your connector will fail.
If things don’t work at first, stop the docker process completely and run this:
docker compose -f docker-compose.dev.yml -p danswer-stack up -d --pull always --force-recreate
Sometimes this is required to run a few times to make sure the
.env
file is being used for some reason.Potentially restart ollama and docker if that doesn’t work
(bonus) For anyone running MacOS 13 or higher, open the link in Safari and from the menu bar, choose File > Add to Dock. This will create a standalone webapp.
- More info here: https://support.apple.com/en-us/104996
Code completion and generation using Continue
I haven’t tested this one yet, but it was on the ollama blog and should work well using the local LLMs installed with ollama. Continue acts as your own AI code assistant within your IDE. It works with VScode and JetBrains for now.
Continue: https://www.continue.dev/
VScode extension: https://marketplace.visualstudio.com/items?itemName=Continue.continue
ollama blog post: https://ollama.com/blog/continue-code-assistant