Setting up local LLMs using ollama

LLMs
Author

Sven Nelson

Published

June 3, 2024

Getting started in RStudio

  • Brief overview
  • Projects
  • Github
  • (very brief) Quarto/this blog

Installing local LLMs

Ollama

Ollama is an open-source framework that allows users to run and manage large language models (LLMs) locally on their machines.

  1. Download ollama and install (available for Mac, Linux, and Windows).

  2. Run ollama for the first time, you will see a llama icon in your menu bar.

  3. Now you can run your first test with:

    ollama run llama3

    This will download the model llama3 and start an interactive chat session.

    When you are done, end chat with \bye

    You can download many other models using this terminal command:

    Model Parameters Size Download
    Llama 3 8B 4.7GB ollama run llama3
    Llama 3 70B 40GB ollama run llama3:70b
    Phi 3 Mini 3.8B 2.3GB ollama run phi3
    Phi 3 Medium 14B 7.9GB ollama run phi3:medium
    Gemma 2B 1.4GB ollama run gemma:2b
    Gemma 7B 4.8GB ollama run gemma:7b
    Mistral 7B 4.1GB ollama run mistral
    Moondream 2 1.4B 829MB ollama run moondream
    Neural Chat 7B 4.1GB ollama run neural-chat
    Starling 7B 4.1GB ollama run starling-lm
    Code Llama 7B 3.8GB ollama run codellama
    Llama 2 Uncensored 7B 3.8GB ollama run llama2-uncensored
    LLaVA 7B 4.5GB ollama run llava
    Solar 10.7B 6.1GB ollama run solar

Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

Open WebUI graphical chat app for ollama models

It would be nicer if we had a GUI chat interface to interact with the model and could get code responses in syntax-highlighted code blocks. We can do that and more by installing Open WebUI.

Open WebUI can also be used with non-local models like ChatGPT and Github Copilot, but we will focus on local LLMs for this tutorial.

Open WebUI: https://github.com/open-webui/open-webui

  1. You will need to have docker installed. The easiest method is to download and install Docker Desktop from here: https://www.docker.com/products/docker-desktop/

    • Docker Desktop is free and can be used for commercial use for companies up to 250 employees OR $10 million in yearly revenue.
  2. Once Docker Desktop is installed, run it. This will enable you to use the docker command from the terminal.

    • There are other ways to install docker, you just need to be able to run docker and docker compose commands.
  3. Install and run Open WebUI with the following command:

    docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
  4. In a web browser go to http://localhost:3000

    • Open WebUI will load

    • You can install additional models from the menu

      • Install llama3:instruct, we will need it later.
    • You can save conversation chains

    • You can select multiple models and ask a question and it will select the best suited model to answer your question.

  5. (bonus) For anyone running MacOS 13 or higher, open the link in Safari and from the menu bar, choose File > Add to Dock. This will create a standalone webapp.

Installing Danswer for RAG with ollama models

RAG stands for Retrieval-Augmented Generation. So far our models can answer questions and are great for giving code examples, but the model cannot access a web page or pull information from local documents. With Danswer, we can allow ollama models to do exactly that.

Danswer: https://www.danswer.ai/

Danswer Github: https://github.com/danswer-ai/danswer

This install is a little more involved than the last.

  1. Go to the Danswer website (https://www.danswer.ai/) and click “Self-host for Free.”

  2. Clone the Danswer repo:

    git clone https://github.com/danswer-ai/danswer.git
  3. Navigate to danswer/deployment/docker_compose

    cd danswer/deployment/docker_compose
  4. Edit the file danswer/deployment/docker_compose/docker-compose.dev.yml

    • Find "3000:80" and change it to "3001:80"

    • This will make the webapp available at http://localhost:3001 since we are already using port 3000 for Open WebUI

  5. To connect danswer to ollama, create a file called .env in the danswer/deployment/docker_compose directory:

    touch .env
    • Add the following to the file:
    GEN_AI_MODEL_PROVIDER=ollama_chat
    # Model of your choice
    GEN_AI_MODEL_VERSION=llama3:instruct
    # Wherever Ollama is running
    # Hint: To point Docker containers to http://localhost:11434, use host.docker.internal instead of localhost
    GEN_AI_API_ENDPOINT=http://host.docker.internal:11434
    
    # Let's also make some changes to accommodate the weaker locally hosted LLM
    QA_TIMEOUT=120  # Set a longer timeout, running models on CPU can be slow
    # Always run search, never skip
    DISABLE_LLM_CHOOSE_SEARCH=True
    # Don't use LLM for reranking, the prompts aren't properly tuned for these models
    DISABLE_LLM_CHUNK_FILTER=True
    # Don't try to rephrase the user query, the prompts aren't properly tuned for these models
    DISABLE_LLM_QUERY_REPHRASE=True
    # Don't use LLM to automatically discover time/source filters
    DISABLE_LLM_FILTER_EXTRACTION=True
    # Uncomment this one if you find that the model is struggling (slow or distracted by too many docs)
    # Use only 1 section from the documents and do not require quotes
    QA_PROMPT_OVERRIDE=weak
    • IMPORTANT: Add an empty return at the end. The final empty line at the end appears important.
  6. Make sure Docker Desktop is running.

  7. To start the application run: (run this from within danswer/deployment/docker_compose)

    docker compose -f docker-compose.dev.yml -p danswer-stack up -d --pull always --force-recreate
    • This step may take a little while to download everything and install.
  8. Go to http://localhost:3001 to load the webapp

    • You will need to set up the app before you can start using it:

      1. Go to “Search / Chat with Knowledge” > “Get started”
      2. Select the “Custom” tab
      3. Fill in the following values: (none of these are optional even if it say it is)
        • Display Name: ollama

        • Provide Name: ollama

        • API Base: http://host.docker.internal:11434

        • Model Names:

          • llama3:instruct

          • llama3:latest

          • (you can add more here if you have more installed)

        • Default Model: llama3:instruct

      4. Click “Test” to see if all of your settings are correct.
        • If it’s successful, you may be good to go

        • You will know for sure once you test running a chat.

      5. Click “Enable”
      6. Click “Setup your first connector”
        • You can add a document or link to include in your search

        • Danswer also makes it possible to link popular services like Slack and Google docs

        • Try to keep your document/website size small enough. If you try to index all of Google your connector will fail.

  9. If things don’t work at first, stop the docker process completely and run this:

    docker compose -f docker-compose.dev.yml -p danswer-stack up -d --pull always --force-recreate
    • Sometimes this is required to run a few times to make sure the .env file is being used for some reason.

    • Potentially restart ollama and docker if that doesn’t work

  10. (bonus) For anyone running MacOS 13 or higher, open the link in Safari and from the menu bar, choose File > Add to Dock. This will create a standalone webapp.

Code completion and generation using Continue

I haven’t tested this one yet, but it was on the ollama blog and should work well using the local LLMs installed with ollama. Continue acts as your own AI code assistant within your IDE. It works with VScode and JetBrains for now.

Continue: https://www.continue.dev/

VScode extension: https://marketplace.visualstudio.com/items?itemName=Continue.continue

ollama blog post: https://ollama.com/blog/continue-code-assistant