S. Jay
S. Jay

Reputation: 313

Error libnvidia-ml.so.1: cannot open shared object file raised when running docker image with gpu

The error:

nvidia-container-cli: initialization error: load library 
failed: libnvidia-ml.so.1: cannot open shared object file: no 
such file or directory: unknown

I am trying to use nvidia/cuda image in docker hub to use GPU. So I run the code below with --gpus all.

docker run -it --gpus all -v --name my-gpu nvidia/cuda:11.7.0-cudnn8-devel-ubuntu22.04

But this gives me error which is as below.

Unable to find image 'nvidia/cuda:11.7.0-cudnn8-devel-ubuntu22.04' locally

11.7.0-cudnn8-devel-ubuntu22.04: Pulling from nvidia/cuda
d19f32bd9e41: Already exists 
292e5e4dcc78: Already exists 
f027458ef473: Already exists 
ad9cd0a3350e: Already exists 
4c0e748dfb24: Already exists 
e40f2cbf6f5e: Pull complete 
3ac150f167fe: Pull complete 
dd80ebac36de: Pull complete 
fd2716719ab6: Pull complete 
e5ff1925518e: Pull complete 
Digest: sha256:1055a2fa47b063336f578f390131efa4bb02fbfe095608481fd32b6fca9b8b32
Status: Downloaded newer image for nvidia/cuda:11.7.0-cudnn8-devel-ubuntu22.04
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
ERRO[0465] error waiting for container: context canceled 

But if I run the same code with sudo, it works completely fine.

sudo docker run -it --gpus all --name my-container-03  nvidia/cuda:11.7.0-cudnn8-devel-ubuntu22.04

How can I make it run without sudo? I must not run with sudo in my case now.

Upvotes: 18

Views: 27519

Answers (5)

MenoMore
MenoMore

Reputation: 44

Apparently, to use the nvidia runtime with the nvidia-container-toolkit, You have to use sudo.

Running it without sudo causes the libnvidia-ml error.

Here's what I did:

# BUILD IMAGE
sudo docker build -t my-image:latest .

# RUN IMAGE WITH NVIDIA RUNTIME
sudo docker run --rm --runtime=nvidia --gpus all image:latest

Upvotes: 1

Prajot Kuvalekar
Prajot Kuvalekar

Reputation: 6668

This pops-up when you use --gpus all flag in your docker run cmd and normally points to the wrong installation of your docker package

a simple solution I found was running the below cmd so as to install docker as documented

sudo apt-get install -y nvidia-docker2

Upvotes: 0

Vincz777
Vincz777

Reputation: 698

I had docker installed under snap. I removed it (sudo snap remove --purge docker) and reinstalled with apt and it worked.

Upvotes: 6

PALEN
PALEN

Reputation: 2876

I had the error described. In my case, what helped was to:

Uninstall everything (pre-existing CUDA + Nvidia drivers + docker). Then follow the steps to install (pre-installation, installation, post-installation):

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

That guide contains definitive instructions for uninstalling & installing (which I used and worked).

Upvotes: 1

S. Jay
S. Jay

Reputation: 313

It was solved when I installed docker desktop.

Upvotes: -7

Related Questions