Installing NVIDIA drivers for application on K8S

Question

We have a flask app that's deployed on k8s. The base image of the app is this: https://hub.docker.com/r/tiangolo/uwsgi-nginx-flask/, and we build our app on top of this. We ship our docker image to ECR, and then deploy pods on k8s.

We want to start running ML models in our k8s nodes. The underlying nodes have GPUs (we're using g4dn instances), and they are using a GPU AMI.

When running our app, I'm seeing the following error:

/usr/local/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0

What's the right way to get CUDA installed on our nodes? I would have expected it to be built into the AMI shipped with the gpu instances but that doesn't seem to be the case.

Installing NVIDIA drivers for application on K8S

Answers (1)

Related Questions