Anton
Anton

Reputation: 11

Ollama isnt using my gpu on a runpod.io pod

I am testing different AI models on runpod.io. One of those models is dolphin-mixtral:8x22b. I followed Runpod's tutorial for setting up the pod with Ollama: https://docs.runpod.io/tutorials/pods/run-ollama, and I used the H100SXM with 80GB VRAM and 16 vCPU 125 GB RAM.

However, when I start the model and ask it something like "hey," it uses 100% of the CPU and 0% of the GPU, and the response takes 5-10 minutes.

How to make Ollama use my GPU?

I tried different server settings

Upvotes: 0

Views: 1520

Answers (2)

Bakri Bitar
Bakri Bitar

Reputation: 1697

I just came across same problem. I ended up with running smaller models (fewer params).

Because what happens is that the VRAM is being fully occupied by this large model and thus parts of the model are forcibly shifted to run on the RAM and on the CPU.

Upvotes: 0

Julian Patdu
Julian Patdu

Reputation: 21

I ran into the same issue, found the answer in this Reddit post.

Set the CUDA_VISIBLE_DEVICES environment variable to 0,1 before running ollama serve:

:/# export CUDA_VISIBLE_DEVICES=0,1
:/# echo $CUDA_VISIBLE_DEVICES
0,1
:/# ollama serve

Tested on an A4000 pod.

Upvotes: 1

Related Questions