Wyoming whisper and piper on Proxmox with NVIDIA GPU PCI passthrough and CUDA support in Docker

Home Wyoming whisper and piper on Proxmox with NVIDIA GPU PCI passthrough and CUDA support in Docker

Wyoming whisper and piper on Proxmox with NVIDIA GPU PCI passthrough and CUDA support in Docker

8th Feb 2024 homeassistant whisper piper docker

This tutorial will guide you towards setting up fully local text-to-speech and speecth-to-text with Wyoming Whisper and Piper. We'll be doing this on a Proxmox 8.1.4 virtual environment, within an Ubuntu 22.04 LXC. Thanks to CUDA, this will be blazing fast, but it requires you to have a compatible GPU from NVIDIA; for example the GeForce RTX 3060.

The tutorial is based upon this modest rig:

A B550 chipset motherboard with an AMD Ryzen 5 5600G CPU, 32 GB RAM and 1 TB NVMe SSD
A NVIDIA GeForce RTX 3060 GPU, 12 GB DDR

What really matters here is that you have a GPU that supports CUDA, and that the GPU is compatible with your motherboard. If you do, your motherboard and CPU are also probably compatible with the Proxmox requirements for virtualization.

Outline:

Verify your BIOS settings
Install NVIDIA drivers on host system
Create the VT (LXC)
Pass the GPU to the LXC from the host
Install NVIDIA drivers on guest system
Install Docker on guest system
Install CUDA support on guest system
Install Whisper and Piper

Verify your BIOS settings

Generally speaking, you may need to open up your BIOS for virtualization. The problem is, different vendors and different chipsets have different menus, names and default settings. I found this general guide for Proxmox 7.x, which I used as a guide for Proxmox 8.1.4.

In my particular BIOS, I did the following: Firstly, i Disabled CSM. Then I did the following under the Advanced menu:

SVM Mode (enable)
PCI > Above 4G decoding (enable)
PCI > SRV-IOV Support (enable)

I may have changed other virtualization options a year ago when setting up Proxmox the first time, but these were the ones I tweaked especially for the case of virtualizing the GPU. I'm not sure I needed to perform these steps, as everything worked on my first try after doing this.

Install NVIDIA drivers on host system

In order to pass through your GPU to guests, you need the drivers installed on the host. Installing the drivers involves compiling them specifically for your system.

You need to be in the Linux shell of you Proxmox host as root to do these things. Use SSH or connect with a keyboard and screen directly to the Proxmox host computer.

For compiling, you need gcc and make:

apt-get install gcc
apt-get install make

For tailoring to your specific system, you need to download the correct kernel headers:

uname -a

This will output something like: Linux proxmox-node1 6.5.11-8-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-8 (2024-01-30T12:27Z) x86_64 GNU/Linux Now, use use the version number <6.5.11-8-pve> to install the kerne headers like this:

apt-get install pve-headers-6.5.11-8-pve

The Proxmox 8.1.4 kernel has a driver which is incompatible with the NVIDIA driver, so you need to disable the Nouveau kernel driver before you install the NVIDIA driver. This requires a few steps.

Create a file:

nano /etc/modprobe.d/blacklist-nouveau.conf

Add the following to the file:

blacklist nouveau
options nouveau modeset=0

Now run:

update-initramfs -u
reboot

After rebooting the Proxmox host, you can download the latest NVIDIA driver from the NVIDIA website. For this setup, choose Linux x86_64/AMD64, Latest Production Branch, download the drivers, make them runnable as root on the Proxmox host system, and install like this:

wget https://us.download.nvidia.com/XFree86/Linux-x86_64/535.154.05/NVIDIA-Linux-x86_64-535.154.05.run
chmod +x NVIDIA-Linux-x86_64-535.154.05.run
./NVIDIA-Linux-x86_64-535.154.05.run

Don't worry about the x library path message during install. Also, I installed the 32-bit compatibility drivers but don't know if I needed it. And finally, don't update xorg settings.

Now it's time to check if your GPU is detected. Run:

nvidia-smi -L

Find the cgroups assigned to your GPU:

ls -l /dev/nvidia*

Look up the cgroup numbers for:

/dev/nvidia0
/dev/nvidia-uvm

In my case they were 195 and 511. Take note of these cgroup numbers!

Create the VT (LXC)

It's time to move on the to the guest LXC where we'll run whisper and piper.

Firstly, you need to download a template. You do this from the Proxmox webgui. You will probably need to access the GUI in a browser from another computer on the same network as your Proxmox server. Go to the following address:

https://proxmox-host-ip-address:8006/

In the GUI, "local" storage on your "proxmox node". The "local" storage you are looking for is located in the leftmost menu in Proxmox towarwds the bottom - not to be confused with the "local" in the second-leftmost menu.

Now choose CT templates, and select the Ubuntu 22.04 standard template from "templates".
Create a VT:
- 2 cpu, 512 mb ram/swap, 12gb disk, network dhcp ipv4
- Select the Ubuntu 22.04 image
Note the lxc number (for example 101)
Don't start the VT yet!

PS! If you plan on using Whisper and Piper with Home Assistant, it's important that your VT gets an IP-address that is routable from your Home Assistant server. This is most easily accomplished by having the guest connect via a network bridge, e.g. vmbr0, with the same gateway as your Home Assistant server. You should use a static IP, but you may use a Dynamic IP if you reserve IPs on the DHCP-server side. If you use VLANs, also make the bridge is VLAN aware.

Pass the GPU to the LXC from the host

Now, go back to the Proxmox host shell as root to edit the LXC config file to pass through the GPU from the host to the guest:

nano /etc/pve/lxc/101.conf

Add the following 7 lines to the bottom of the lxc configuration file, but swap out 195 and 511 if you got other cgroup numbers:

lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 511:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

Now you are ready to start the LXC and go into it.

Install Nvidia drivers on guest system

With the guest up and running, enter the guest shell via the console available in the Proxmox GUI.

Installing the drivers on the guest system is easier than on the host. You need to download the same drivers as for the host, but you aren't compiling anything, because we reuse the kernel modules that were built and installed on the host (because the LXC shares kernel and headers with it's host).

Download and install like this (as root on the guest):

wget https://us.download.nvidia.com/XFree86/Linux-x86_64/535.154.05/NVIDIA-Linux-x86_64-535.154.05.run
chmod +x NVIDIA-Linux-x86_64-535.154.05.run 
./NVIDIA-Linux-x86_64-535.154.05.run --no-kernel-module

If the install fails in the middle, just try again. You may need 2 or 3 tries before it goes through. Don't worry about x, xorg and install the 32-bit driver like for the host.

In the end, check that the card is available inside the guest:

nvidia-smi -L

Your card should be listed now.

Install docker on guest system

We need docker to run Whisper and Piper. Before we install docker, we need to update the system and for the sake of cleanliness, triple check and remove older docker components. Run these commands as root on the guest system:

apt-get update
apt-get dist-upgrade
for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install docker
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Install CUDA support for docker on guest system

Technology moves really fast, but not all technology moves at the same pace. It seems NVIDIA is moving faster with their development kits than Wyoming whisper does with using the Cuda programming interface, so this guide may be outdated at the time of writing. CUDNN is a critical component in this mix, and it's currently only available in version 9 on the NVIDIA web pages, but we'll install version 8 (9 may work, but I haven't tried it). We'll use a mix of old and new libraries, but trust the prosess!

First, install the NVIDIA container toolkit and configure it. Do this as root on the guest system:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker

Now we need to do a tweak on the NVIDIA container runtime (didn't work unless I did this on my system, not sure why...):

vi /etc/nvidia-container-runtime/config.toml

Change:
#no-cgroups = false
To:
no-cgroups = true

Now you need to restart docker:

sudo systemctl restart docker

Additionally, we need some libraries for Whisper, that we don't get from the NVIDIA container toolkit:

sudo apt install libcublas11

wget https://developer.download.nvidia.com/compute/redist/cudnn/v8.8.0/local_installers/11.8/cudnn-local-repo-ubuntu2204-8.8.0.121_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2204-8.8.0.121_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2204-8.8.0.121/cudnn-local-B66125A0-keyring.gpg /usr/share/keyrings
sudo apt-get update
sudo apt install libcudnn8

Install whisper and piper

Finally, we need to install Whisper and Piper. I've installed it with Norwegian as the basis, but it's just a matter of changing parameters and language codes if you want something else.

Tips for Whisper. Tips for Piper.

Create a file on the guest system named docker-compose.yml and fill it with the following content:

version: '3'
services:
  wyoming-whisper:
    image: rhasspy/wyoming-whisper:latest
    ports:
      - "10300:10300"
    volumes:
      - ./whisper-data:/data
      - /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8:/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8:ro
      - /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8:/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8:ro
      - /usr/lib/x86_64-linux-gnu/libcublasLt.so.11:/usr/lib/x86_64-linux-gnu/libcublasLt.so.12:ro
      - /usr/lib/x86_64-linux-gnu/libcublas.so.11:/usr/lib/x86_64-linux-gnu/libcublas.so.12:ro
    command: --model medium-int8 --language no --beam-size 5 --device cuda
    restart: unless-stopped
    runtime: nvidia
    deploy:
        resources:
          reservations:
            devices:
              - driver: nvidia
                count: 1
                capabilities: [gpu]

  piper:
    image: rhasspy/wyoming-piper
    command: --voice no_NO-talesyntese-medium
    volumes:
      - ./piper-data:/data
    environment:
      - TZ=Europe/Oslo
    restart: unless-stopped
    ports:
      - 10200:10200

Start the services:

docker compose up -d

Debug containers:

docker ps
docker logs <id>

To connect to the services from Home Assistant, use the IP of the guest system + port numbers 10300 and 10200 respectively. To find the IP of the guest, run this command from the guest shell:

ip a

Troubleshooting and tips

If you skimped on the disk, you can resize it on the fly from the host shell as root. Example, resize disk of guest id 101 to 30G:

pct resize 101 rootfs 30G

# Validate that it's ok - but it should be changed already, and no restart should be needed on LXC
vi /etc/pve/lxc/101.conf

Sometimes the cgroups can change between host reboots. If this happens, edit the LXC config accordingly and restart the guest system. If you experience other problems, try reinstalling the driver, but make sure you do it on both host and guest - with the same driver version to avoid problems.

Previous Post Next Post