Reputation: 53
I use singularity and I need to install a nvidia driver in my singularity container to do some deep learning with a gtx 1080. This singularity image is created from a nvidia docker from here: https://ngc.nvidia.com/catalog/containers/nvidia:kaldi and converted to a singularity container. There was no nvidia drivers I think because nvidia-smi was not found before I install the driver.
I did the following commmands :
add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
apt install nvidia-418
after that I wanted to see if the driver was well installed, I did the command :
nvidia-smi
which return : Failed to initialize NVML: Driver/library version mismatch
I searched about how to solve this error and found this topic : NVIDIA NVML Driver/library version mismatch
One answer says to do the command :
lsmod | grep nvidia
and then to rmmod on each except nvidia and finally to rmmod nvidia.
rmmod drm
But when I do this, as the topic excepted it, I have the error : rmmod: ERROR: Module nvidia is in use.
The topic says to tap lsof /dev/nvidia*, and to kill the process that use the module, but I see nothing with drm written, and it seems to be a very bad idea to kill the process (Xorg, gnome-she).
Here is the answer to the command lsof /dev/nvidia*, followed by the command lsmod | grep nvidia, and then rmmod drm
Rebooting the computer also didn't work.
what should I do to manage using nvidia-smi and be able to use my GPU from inside the singularity container ?
Thank you
Upvotes: 0
Views: 5262
Reputation: 53
thank you for your answer. I wanted to install the GPU driver in the singularity container because when inside the container, I wasn't able to use the GPU (nvidia-smi : command not found) while outside of the container I could use nvidia-smi.
You are right, the driver should be installed outside of the container, I wanted to install it in the container to avoid my problem of not having access to the driver from inside the container.
Now I found the solution : To use GPU from inside the singularity container, you must add --nv when calling the container. example :
singularity exec --nv singularity_container.simg ~/test_gpu.sh
or
singularity shell --nv singularity_container.simg
When you add --nv, the container will have access to the nvidia driver and nvidia-smi will work. Without this you will not be able to use GPU, nvidia-smi will not work.
Upvotes: 1
Reputation: 3772
You may need to do the above steps in the host OS and not in the container itself. /dev
is mounted into the container as is and still subject to use by the host, though the processes are run in a different userspace.
Upvotes: 1