Implement Local RAG Service: Integrate Open WebUI, Ollama, and Qwen2.5

Introduction#

When building information retrieval and generative AI applications, the Retrieval-Augmented Generation (RAG) model has gained increasing popularity among developers due to its powerful ability to retrieve relevant information from knowledge bases and generate accurate answers. However, achieving an end-to-end local RAG service requires not only the right model but also the integration of a robust user interface and an efficient inference framework.

Utilizing an easily deployable Docker approach can greatly simplify model management and service integration when constructing a local RAG service. Here, we rely on the user interface and model inference service provided by Open WebUI, and introduce the bge-m3 embedding model via Ollama to achieve document vectorization for retrieval, thereby assisting Qwen2.5 in generating more precise answers.

In this article, we will discuss how to quickly launch Open WebUI via Docker, synchronize Ollama's RAG capabilities, and implement an efficient document retrieval and generation system in conjunction with the Qwen2.5 model.

Project Overview#

This project will utilize the following key tools:

Open WebUI: Provides a web interface for user interaction with the model.
Ollama: Used for managing embedding and large language model inference tasks. The bge-m3 model in Ollama will be used for document retrieval, while Qwen2.5 will be responsible for answer generation.
Qwen2.5: The model part utilizes the Qwen 2.5 series launched by Alibaba, providing natural language generation for retrieval-augmented generation services.

To implement the RAG service, we need the following steps:

Deploy Open WebUI as the user interaction interface.
Configure Ollama to efficiently schedule the Qwen2.5 series models.
Use the embedding model named bge-m3 configured in Ollama to achieve retrieval vectorization processing.

Deploying Open WebUI#

Open WebUI provides a simple Dockerized solution, allowing users to start the web interface directly without manually configuring a large number of dependencies.

First, ensure that Docker is installed on the server. If not, you can quickly install it using the following command:

curl https://get.docker.com | sh

Then create a directory to save Open WebUI's data, so that the data will not be lost after project updates:

sudo mkdir -p /DATA/open-webui

Next, we can start Open WebUI with the following command:

docker run -d -p 3000:8080 \
        --add-host=host.docker.internal:host-gateway \
        -v /DATA/open-webui:/app/backend/data \
        --name open-webui \
        --restart always \
        ghcr.io/open-webui/open-webui:main

If you want to run Open WebUI with Nvidia GPU support, you can use the following command:

docker run -d -p 3000:8080 \
        --gpus all \
        --add-host=host.docker.internal:host-gateway \
        -v /DATA/open-webui:/app/backend/data \
        --name open-webui \
        --restart always \
        ghcr.io/open-webui/open-webui:cuda

Here, we expose the Open WebUI service on port 3000 of the machine, which can be accessed via the browser at http://localhost:3000 (for remote access, use the public IP and open port 3000). /DATA/open-webui is the data storage directory, and you can adjust this path as needed.

Of course, in addition to the Docker installation method, you can also install Open WebUI via pip, source compilation, Podman, etc. For more installation methods, please refer to the Open WebUI official documentation.

Basic Settings#

Enter the account information to register, set a strong password!!!

Important

The first registered user will be automatically set as the system administrator, so please ensure you are the first registered user.

Click the avatar in the lower left corner and select the admin panel.
Click on settings in the panel.
Disable new user registration (optional).
Click save in the lower right corner.

Configuring Ollama and Qwen2.5#

Deploying Ollama#

Install Ollama on the local server. Currently, Ollama provides various installation methods, please refer to Ollama's official documentation to download and install the latest version 0.3.11 (Qwen2.5 only starts supporting this version). Installation details can be found in an article I previously wrote: Ollama: From Beginner to Advanced.

Start the Ollama service (if started via Docker, this is not necessary, but port 11434 must be exposed):

ollama serve

Once the Ollama service is started, you can connect to it by accessing http://localhost:11434.

Ollama Library provides semantic vector models (bge-m3) and various text generation models (including Qwen2.5). Next, we will configure Ollama to meet the needs of document retrieval and question-answer generation for this project.

Downloading the Qwen2.5 Model#

To install Qwen2.5 via Ollama, you can run the ollama pull command directly in the command line to download the Qwen2.5 model. For example, to download the 72B model of Qwen2.5, you can use the following command:

ollama pull qwen2.5:72b

This command will fetch the Qwen2.5 model from Ollama's model repository and prepare the runtime environment.

Qwen2.5 offers various model sizes, including 72B, 32B, 14B, 7B, 3B, 1.5B, 0.5B, etc. You can choose the appropriate model based on your needs and GPU memory size. I am using a server with 4x V100, so I can directly choose the 72B model. If you require faster output speed and can accept slight performance loss, you can use the quantized version qwen2.5:72b-instruct-q4_0; if you can accept slower output speed, you can use qwen2.5:72b-instruct-q5_K_M. For the 4x V100 server, although the token generation of the q5_K_M model is noticeably lagging, I still chose the q5_K_M model to experiment with Qwen2.5's performance.

For personal computers with less memory, it is recommended to use the 14B or 7B models, which can be downloaded with the following commands:

ollama pull qwen2.5:14b

ollama pull qwen2.5:7b

If you have both Open WebUI and Ollama services running, you can also download the model from the admin panel.

Downloading the bge-m3 Model#

Download the bge-m3 model in Ollama, which is used for document vectorization processing. Run the following command in the command line to download the model (or download it in the Open WebUI interface):

ollama pull bge-m3:latest

At this point, we have completed the configuration of Ollama, and next we will configure the RAG service in Open WebUI.

RAG Integration and Configuration#

Configuring Ollama's RAG Interface in Open WebUI#

Accessing the Open WebUI Admin Interface#

After starting Open WebUI, you can directly access the service address through a web browser, log in to your admin account, and then enter the admin panel.

Setting the Ollama Interface#

In the Open WebUI admin panel, click on Settings, and you will see options for external connections. Ensure that the Ollama API address is host.docker.internal:11434, then click the verify connection button on the right to confirm that the Ollama service is connected properly.

Setting the Semantic Vector Model#

In the Open WebUI admin panel, click on Settings, then click on Documents, and complete the following steps:

Set the semantic vector model engine to Ollama.
Set the semantic vector model to bge-m3:latest.
The remaining settings can be kept as default; here I set the maximum upload size to 10MB, maximum upload quantity to 3, Top K to 5, block size and block overlap to 1500 and 100 respectively, and enabled PDF image processing.
Click save in the lower right corner.

Testing the RAG Service#

Now, you have implemented a complete local RAG system. You can enter any natural language question in the main interface of Open WebUI, then upload the corresponding document. The system will call the semantic vector model to vectorize the document, and then use the Qwen2.5 model to retrieve the document, generate answers, and return them to the user.

In the user chat interface of Open WebUI, upload the document you want to retrieve, then enter your question and click send. Open WebUI will call Ollama's bge-m3 model for document vectorization processing, and then call the Qwen2.5 model for question-answer generation.

Here, I uploaded a simple txt file (text generated by GPT) with the following content:

# Adventure in the Enchanted Forest

## Introduction
In a distant kingdom's border, there lies a mysterious enchanted forest, rumored to be home to many strange creatures and ancient magic. Few dare to enter, for those who have ventured into the forest have never returned. The story's protagonist is a young adventurer named Evan.

## Chapter One: Evan's Decision
Evan is a young man who loves adventure and exploration. He has heard many stories about the enchanted forest since childhood. Despite his family and friends urging him not to go, he firmly believes that he is destined to uncover the secrets of this forest. One morning, he packs his bag and, with courage and curiosity, sets off toward the forest.

### 1.1 Preparations Before Departure
Before setting off, Evan visits the town's most famous library to research information about the enchanted forest. He discovers an ancient manuscript that records the route into the forest and how to avoid some of its dangerous creatures. Evan copies this manuscript into his notebook, preparing to refer to it when needed.

### 1.2 The First Crossing
As soon as Evan enters the forest, he feels that the atmosphere here is completely different from the outside world. The air is filled with a rich floral scent, along with faint strange sounds. On the first day of crossing the forest, Evan encounters no danger, but he can sense that something is watching him from the shadows.

## Chapter Two: Mysterious Creatures
The next day, Evan continues deeper into the forest. However, he doesn't go far before encountering a strange creature. It is a glowing little deer, radiating a soft blue light. At first, Evan feels surprised and fearful, but the little deer shows no intention of attacking him and instead leads him to a hidden cave.

### 2.1 Secrets in the Cave
Inside the cave, Evan discovers an ancient stone tablet inscribed with strange symbols. The little deer seems to know the meaning of these symbols and guides Evan step by step in deciphering them. It turns out that these symbols record a powerful magic that can help him find lost treasures in the forest.

### 2.2 Receiving Help
Evan decides to accept the little deer's help to unlock the secrets of these symbols. They spend several days in the cave, and Evan learns how to use the resources in the forest to make potions and weapons. Through this, his survival skills in the forest greatly improve.

## Chapter Three: The Final Trial
With the little deer's guidance, Evan finally arrives at the heart of the forest, where there is an ancient altar. It is said that only the bravest adventurers can pass the altar's trials to obtain the ultimate treasure.

### 3.1 Facing Fear
The area around the altar is filled with various traps and illusions. Evan must confront the fears deep within himself to pass these obstacles. Ultimately, he uses his wisdom and courage to overcome everything and earns the right to enter the altar.

### 3.2 Discovering the Treasure
At the center of the altar, Evan discovers a sparkling gem. It is said that this gem possesses the power to change one's fate. Evan picks up the gem and feels its immense power. He knows that this is not just a treasure, but possibly the key to unraveling the secrets of the enchanted forest.

## Conclusion
Evan successfully uncovered part of the secrets of the enchanted forest, becoming a legendary hero. His adventure story also inspires more young adventurers to embark on journeys of exploration into the unknown world with courage and wisdom.

Then I asked three questions:

What strange creature did Evan encounter in the forest?
What was inscribed on the ancient stone tablet that Evan found in the cave?
What treasure did Evan discover at the center of the altar?

The following image shows the answers:

Summary#

With the help of Open WebUI and Ollama, we can easily build an efficient and intuitive local RAG system. By using the bge-m3 semantic vector model for text vectorization and combining it with the Qwen2.5 generation model, users can interact efficiently with document retrieval and augmented generation tasks in a unified web interface. This not only protects data privacy but also significantly enhances the localization capabilities of generative AI applications.

Original Link#

https://cuterwrite.top/p/integrate-open-webui-ollama-qwen25-local-rag/