I decide to build a self hosted AI system with some cheap hardwares to test and learn AI.
Here’s what I have right now:
CPU: AMD Ryzen 3950X 16 cores 32 threads
Memory: 32GB
Motherboard: Aorus X570 eLite Wifi
Storage: 500GB NVMe
GPU: AMD MI50 32GB
GPU for video output: Nvidia 1050 Ti 4GB
At the first, I downloaded and installed Ubuntu 24.04 LTS for Server. it recognized the MI50 as MI60(gfx906) and can smoothly install the rocm 6.3.4, amdgpu-dkms. then I downloaded ollama-linux-amd64.tgz and ollama-linux-amd64-rocm.tgz from ollama github, After tar -C /usr -xzf them. ollama was installed and worked without any problem.
links:
https://repo.radeon.com/amdgpu-install/6.3.4/ubuntu/noble/amdgpu-install_6.3.60304-1_all.deb
https://github.com/ollama/ollama/releases/download/v0.11.3/ollama-linux-amd64.tgz
https://github.com/ollama/ollama/releases/download/v0.11.3/ollama-linux-amd64-rocm.tgz
Then I decide to use Proxmox which is easy to deploy and manage. Downloaded Proxmox 9 from their website. the installation is easy and smooth. the kernel is 6.14.8-2-pve. the system is Debian 13 trixie. it recognized the MI50 as AMD Vega 20. it’s different with Ubuntu, and problem comes. I created a VM and installed Ubuntu 24.04 LTS server version on it. the next step is to passthrough MI50 to the VM. I added it as a PCI device to the VM, booted the VM. ssh to the VM and installed ROCM 6.3.4. Tried rocm-smi, got nothing. use dmesg |grep amdgpu, I noticed the card stuck at atom loop which is because amd driver can not reset vega cards correctly. I need to install a mod named vendor-reset on Proxmox. Okay, I logged into Proxmox through ssh. installed git, dkms, make etc, git cloned vendor-reset. tried dkms install . , got compile errors. googled and found that the vendor-reset is too old and not updated for several years. the new kernels have changed, I need to change vendor-reset sources codes. Actually, the change is easy, just open the file: src/amd/amdgpu/atom.c and change the #include <asm/unaligned.h> to #include <linux/unaligned.h> , that’s it. After that, I compiled the vendor-reset mod and installed it. this mod need to load at early stage, so need to copy vendor-reset/udev/99-vendor-reset.rules to /etc/udev/rules.d/. reboot the proxmox and VM.
may need to add pci=realloc,nocrs to /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT=”quiet pci=realloc,nocrs amd_iommu=pt iommu=pt vfio_pci.ids=1002:66a1″
save the edited file and run update-grub, reboot
make sure the /sys/bus/pci/devices/0000:0c:00.0/reset_method file’s content is device_specific. otherwise mod vendor_reset is not loaded correctly. you can test the vendor_reset by echo 1 > /sys/bus/pci/devices/0000:0c:00.0/reset
after rebooted, qm start the vm and rocm-smi will show the MI50 on VM.
some setups:
on VM:
when got error: Unable to open /dev/kfd read-write: Permission denied
sudo usermod -aG render,video $USER
on Proxmox host:
nano /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT=”quiet amd_iommu=on iommu=pt”
run: update-grub
nano /etc/modprobe.d/vfio.conf
options vfio-pci ids=1002:66a3 disable_vga=1
run: update-initramfs -u
the vm file: /etc/pve/qemu-server/100.conf:
agent: 1
balloon: 0
bios: seabios
boot: order=virtio0;net0
cores: 8
cpu: host
cpuunits: 1024
hostpci0: 0c:00,pcie=1,rombar=0
machine: q35,viommu=intel
memory: 30000
meta: creation-qemu=10.0.2,ctime=1754522929
name: ubuntu2404
net0: virtio=BC:24:11:64:3C:98,bridge=vmbr0,queues=8
numa: 1
ostype: l26
scsihw: virtio-scsi-single
smbios1: uuid=c6c9e083-d046-4867-aec6-03ad5c0df5a6
sockets: 1
virtio0: local-lvm:vm-100-disk-0,discard=on,iothread=1,size=300G
vmgenid: ba5146ec-cfbe-4a1d-9002-ac9858fa42fe
ollama has been installed and worked without problems. the installation is same as before:
I’m installing vllm now, will update this post later.
Update:
just installed vllm for gfx906. here’s how to:
git clone https://github.com/nlzy/vllm-gfx906 and git clone https://github.com/nlzy/triton-gfx906
use conda to create a env for vllm: conda create -n vllm-venv python=3.11, then conda actovate vllm-venv.
install triton first:
cd triton-gfx906
pip3 install ‘torch==2.7’ torchvision torchaudio –index-url https://download.pytorch.org/whl/rocm6.3
pip3 install ninja ‘cmake<4’ wheel pybind11
pip3 install python/
then install vllm:
cd vllm-gfx906
pip3 install ‘torch==2.7’ torchvision torchaudio –index-url https://download.pytorch.org/whl/rocm6.3
pip3 install -r requirements/rocm-build.txt
pip3 install -r requirements/rocm.txt
pip3 install –no-build-isolation .
if got errors: /lib/libstdc++.so.6: version `GLIBCXX_3.4.32′ not found, need to: conda install -c conda-forge libstdcxx-ng
after vllm installed, tried: vllm –help got this error: ValueError: ‘aimv2’ is already used by a Transformers config, pick another name. the solution is to downgrade transformers to lower then 4.54.0: pip install “transformers<4.54.0”
When tried to load model using vllm, got another error: torch._inductor.exc.InductorError: ZeroDivisionError: float division by zero. the reason is the torch wants to get gpu’s memory bandwidth which the MI50 does not give out. the solution is modify the ~/miniconda3/envs/vllm-venv/lib/python3.11/site-packages/torch/_inductor/scheduler.py . find:
gpu_memory_bandwidth = get_gpu_dram_gbps()
gpu_flops = get_device_tflops(dtype) * 10**12
add: gpu_memory_bandwidth = 1024000000000 # 1 TB/s (MI50’s HBM2 bandwidth) into the middle. the final codes:
gpu_memory_bandwidth = get_gpu_dram_gbps()
gpu_memory_bandwidth = 1024000000000 # 1 TB/s (MI50’s HBM2 bandwidth)
gpu_flops = get_device_tflops(dtype) * 10**12
save it. that’s it.
Finally, the vllm runs! we can see the AMD cards need lots of fixes to get them run.
The result is that we got the MI50 card runs with ollama and vllm under proxmox 9’s VM(ubuntu 24.04.2 server version).