Categories
AI

Homelab AI: Part 3 – Deploying an AI platform (Ollama, WebUI)

Previous Post: Homelab AI: Part 2 – OS/Software Build

With the operating system deployed and NVIDIA drivers in working order, the next step was to deploy Ollama & WebUI. Ollama allows you to chat and build with open models, the kind of which are perfect for self-hosted deployment. WebUI provides an easy-to-use web front-end to Ollama.

Ollama

Deploying Ollama is easy – one command to deploy:

curl -fsSL https://ollama.com/install.sh | sh

If Ollama deploys correctly and starts running, you should be able to access it on http://localhost:11434 in it’s default configuration. To test this from the command line:

# ss -antp | grep :11434
# curl http://localhost:11434 -v

In it’s default configuration, Ollama will run as a service and will start/stop with the system automatically. You can use Ollama from the command line to manage and run models if you want to:

To get a model for Ollama to use locally, pick one from the list of models here and use ollama pull to download it. For example:

# ollama pull deepseek-r1

Will pull the deepseek-r1 models from the Ollama library. You can download all versions of that model family, or pick a specific model. For example, if you wanted deepseek-r1:8b:

# ollama pull deepseek-r1:8b

You can list all locally installed models using ollama list, and remove local models by using ollama rm modelname. To run a model from the command line, use ollama run modelname (you can use the –verbose flag here if you want to see stats on load, token generation etc).

To quit an active session with a model, type /bye in the session window.

Open WebUI

Open WebUI will form the front-end of the LLM infrastructure I’m setting up. It provides a nice front end with lots of additional features, like RAG integration, web search, model management, and MCP support (which will be useful if I want to extend the functionality of my deployment at a later point).

WebUI is available as a docker image, so I chose to use that as the deployment model.

# apt update && apt-cache search docker
docker.io - Linux container runtime
# apt install docker.io
# systemctl enable docker

To enable NVIDIA support for docker workloads, I need to follow the guidance here and install some additional packages:

# apt install nvidia-container-toolkit
# nvidia-ctk runtime configure --runtime=docker
# systemctl restart docker

Once Docker is restarted, I then need to pull the WebUI docker image and run it. To do this:

# docker pull ghcr.io/open-webui/open-webui:main

If you’re running Open-WebUI and Ollama on the same host, the best way to do this is using docker’s internal networking for connectivity:

# docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

This was perfect for me as I just wanted to expose Ollama to my WebUI only. If you want to use Ollama with WebUI on another host, you should bear in mind that Ollama has no authentication on open ports, so you’d probably want to lock it down to a private network or using firewall rules preventing anything other than your frontend from connecting to it directly. It’s worth reading through the Open WebUI docs if you’re wanting to do this (or change the configuration from the single host build I’m using here).

If you hit problems with running both on the same host, it’s worth checking out the Open WebUI troubleshooting docs here.

If you’re using Open WebUI in it’s default configuration, you can log in on port 3000. You’ll now be prompted to configure a username and password before you can start configuring your instance and interacting with models.

Next: Homelab AI: Part 4 – Benchmarking & Testing

Categories
AI

Homelab AI: Part 2 – OS/Software Build

Previous Post: Homelab AI: Part 1 – Overview & Hardware

In this post I’ll detail the configuration I used to set up my ML host and the steps I followed to make it work.

I decided to make this box dual-boot, with one 250Gb SSD for Windows 10 and the 1TB NVMe for Ubuntu.

Windows 10 Pro

This might seem an odd choice, but I wanted a Windows 10 disk for troubleshooting and benchmarking if needed.

The Windows 10 deployment isn’t for running AI, so I just deployed Windows 10 Pro, the latest NVIDIA drivers for the 3090s, a copy of Steam (and then an install of 3DMark), the Armor Crate/iCUE utilities, and used this disk to confirm both of my 3090s were stable and in working order by subjecting them to some graphical benchmarking. I then turned off all RGB on my components to stop the under-desk disco taking place within the case ๐Ÿ˜‰

Ubuntu 24.04.3 LTS

Ubuntu seemed like a good choice for the OS of choice as I’m familiar with Debian and use it daily, and Ubuntu seems to have more widespread support for some of the packages and software used as part of the wider ecosystem.

I chose to use 24.03.3 LTS Server in a headless deployment – no GUI, minimal software packages deployed, and OpenSSH installed so I could log into the box and run everything remotely. The rationale behind a headless deployment was to use as little video memory/resources as possible, freeing up those resources for any AI/ML workloads.

Once I had the OS deployed onto the NVME SSD, I tested to confirm both GPUs were detected using:

lspci -vnn | grep -E "NVIDIA.*VGA|VGA.*NVIDIA" -A 24

I then installed the latest NVIDIA drivers for Ubuntu following the guidance here:

apt-get install nvidia-driver-580 nvidia-utils-580

After a reboot, I could confirm that both cards were detected and the drivers were installed correctly using the NVIDIA SMI utility:

In the next part of this guide, I’ll run through the configuration of the front-end and back-end components needed to start running and querying Large Language Models (LLMs).

Homelab AI: Part 3 – Ollama, WebUI deployment (TBC)

Categories
AI

Homelab AI Part 1: Hardware Build

I’ve been meaning to experiment more with AI, but wanted to avoid paying for subscriptions to do it. I also wanted to experiment with as wide a variety of models as possible. I had some spare hardware sitting round from my last gaming PC build and thought it might be a good idea to try building a host for ML workloads.

My objectives were:

  1. Create self-hosted capacity for running these workloads whilst spending as little as possible by re-using existing hardware
  2. Learn more about the dependencies and requirements for deploying and running AI/ML locally
  3. Experiment with as wide a variety of models as possible
  4. Retain control over the data/input used and data exported
  5. Develop an understanding of the most efficient/performative configuration for running this type of workload locally

Hardware Install/Build:

For the host buildout, I tried to re-use as much existing hardware and components as I had available. The only purchases at this stage were an additional 3090 FE, and a new 1000W PSU (as I was concerned that 2x3090s would probably be too much power draw for the existing 800W I had spare).

I’d recently upgraded to a RTX 5090 in my main PC and had an RTX 3090 FE going spare. From doing some research into others self-hosting their ML, it appears this card is still popular for running ML workloads given it’s relatively low cost, decent performance and large amounts of VRAM. There was also lots of positive feedback about pairing of two of these cards for running AI/ML, so I decided to buy another one from eBay to try this out.


For Motherboard/CPU/RAM, I had an Asus Prime X570-Pro and a Ryzen 5800X3D. Both are a good fit for this as they support PCIe 4.0 (also the spec supported by my GPU) and the motherboard has 2x PCIe 16x slots available (capable of running both of these cards at the same time). I had 64Gb of DDR4 3600 RAM available (which is a nice starting point given the current costs of DDR4/DDR5 RAM at present). The motherboard has 6xSATA connectors, which is handy for connecting up multiple drives (as well as two hotswap bays in the case).

Storage was provided using a mixture of SSDs I had spare/available. I used a 1TB Crucial NVMe drive, along with 2x250Gb Samsung 870 EVO SATA SSDs for boot drives, and a 4TB Samsung 860 QVO for extra capacity.

For the case, I made use of a spare Coolermaster HAF XB, which has great airflow and lots of space (or so I thought) to fit in all of the components I was intending to use, as well as being a really easy case to work on and to move around. I reused a mixture of Corsair and Thermaltake 140mm/120mm fans to provide airflow.

I’d forgotten how long the 3090 FE was as a card (namely because the 5090 that replaced it was even longer). To get both 3090s into the case, I had to move the 140mm intake fans from inside the case to the outside of the case, mounted between the outside of the case and the front plastic facia. Luckily this worked without the fans fouling on the facia or on the external edges, and left me with enough room to mount both cards.

Given the width of the 3090 cards (3-slots wide), this completely obscured the other PCIe slots on the motherboard, meaning I’d be unable to install any other cards (such as a 10Gb NIC, or an additional NVIDIA GPU).

Specs:
CPU: AMD Ryzen 5800X3D
Motherboard: Asus Prime X570-Pro
RAM: 64GB DDR4 3600
GPU: 2 x NVIDIA 3090 RTX FE
Storage: 5.5TB (2 x 250GB, 1 x 1TB, 1 x 4TB SSD)
Case: Coolermaster HAF XB
PSU: Corsair RMX1000x

Homelab AI: Part 2 – OS Deployment