de1337ed
de1337ed

Reputation: 3315

Installing NVIDIA drivers for application on K8S

We have a flask app that's deployed on k8s. The base image of the app is this: https://hub.docker.com/r/tiangolo/uwsgi-nginx-flask/, and we build our app on top of this. We ship our docker image to ECR, and then deploy pods on k8s.

We want to start running ML models in our k8s nodes. The underlying nodes have GPUs (we're using g4dn instances), and they are using a GPU AMI.

When running our app, I'm seeing the following error:

/usr/local/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0

What's the right way to get CUDA installed on our nodes? I would have expected it to be built into the AMI shipped with the gpu instances but that doesn't seem to be the case.

Upvotes: 0

Views: 261

Answers (1)

Danylo Baibak
Danylo Baibak

Reputation: 2316

There are a couple of options:

  1. Use tensorflow:latest-gpu as base image and setup additional configuration for your system.
  2. Setup Cuda drivers yourself in your Docker image.

Upvotes: 1

Related Questions