Elena
Elena

Reputation: 91

Enable nvidia-smi permissions to be run by all users

How can I enable nvidia-smi for all users. I can run it as a sudo user, but as a regular user, I get:

Failed to initialize NVML: Insufficient Permissions

Upvotes: 9

Views: 11229

Answers (6)

YUN-CHIEN
YUN-CHIEN

Reputation: 21

PC

sudo vim /etc/nvidia-container-runtime/config.toml

edit #user = "root:video" => user = "root:vglusers", save maybe run sudo nvidia-smi and sudo reboot -h


nvidia-docker

You need to execute ll /dev/nvidia* first to confirm the user:group of the nvidia device.

Then populate the group into the Dockerfile

echo "vglusers:x:1001:${USER} >> /etc/group

There is only one thing that is confusing at present is whether there is a variable with the content of 1001 in the system variables

Upvotes: 2

OlivierBondu
OlivierBondu

Reputation: 197

If your issue is happening when attempting to run the command from within a docker container AND you have SELinux activated (say using RHEL for example), then this might do the trick:

First check the SELinux context of your NVidia hardware:

$ ls -lZ /dev/nvidia*
crw-rw-rw-. 1 root root system_u:object_r:xserver_misc_device_t:s0 195,   0 Dec 17 09:44 /dev/nvidia0
crw-rw-rw-. 1 root root system_u:object_r:xserver_misc_device_t:s0 195, 255 Dec 17 09:44 /dev/nvidiactl
crw-rw-rw-. 1 root root system_u:object_r:xserver_misc_device_t:s0 243,   0 Dec 17 09:44 /dev/nvidia-fs0
crw-rw-rw-. 1 root root system_u:object_r:xserver_misc_device_t:s0 243,   1 Dec 17 09:44 /dev/nvidia-fs1
crw-rw-rw-. 1 root root system_u:object_r:xserver_misc_device_t:s0 243,  10 Dec 17 09:44 /dev/nvidia-fs10
...

As you can see the security context does not allow for containers to access the device, so change the security context as per the documentation above:

$ chcon -t container_file_t /dev/nvidia*

Now check again the security context:

$ ls -lZ /dev/nvidia*
crw-rw-rw-. 1 root root system_u:object_r:container_file_t:s0     195,   0 Dec 17 09:44 /dev/nvidia0
crw-rw-rw-. 1 root root system_u:object_r:container_file_t:s0     195, 255 Dec 17 09:44 /dev/nvidiactl
crw-rw-rw-. 1 root root system_u:object_r:container_file_t:s0     243,   0 Dec 17 09:44 /dev/nvidia-fs0
crw-rw-rw-. 1 root root system_u:object_r:container_file_t:s0     243,   1 Dec 17 09:44 /dev/nvidia-fs1
crw-rw-rw-. 1 root root system_u:object_r:container_file_t:s0     243,  10 Dec 17 09:44 /dev/nvidia-fs10
...

This should allow your containers to access the hardware (at least for us it did the trick).

Upvotes: 3

Leszek Gajecki
Leszek Gajecki

Reputation: 31

To follow the answer of @Zealseeker: Since my machine has several users I've changed the rights to rw-rw-rw- for following devices (I have two cards) :

ll /dev/nvidia

crw-rw-rw- 1 root vglusers 195,   0 Oct 13 19:34 /dev/nvidia0
crw-rw-rw- 1 root vglusers 195,   1 Oct 13 19:34 /dev/nvidia1
crw-rw-rw- 1 root vglusers 195, 255 Oct 13 19:34 /dev/nvidiactl
...

This idea was based on files at second machine, where there wasn't an update to driver nvidia-470.

Upvotes: 0

zacario
zacario

Reputation: 1

refer to nvidia-docker github issue
modify /etc/nvidia-container-runtime/config.toml
uncomment user = "root:video"
modify it to user = "root:root"

something else I did was that before I modify the config file, I delete the vglusers group

Upvotes: 0

Zealseeker
Zealseeker

Reputation: 823

I had the problem and here was my solution. Maybe it is helpful to you.

By ll /dev/nvidia* you can find that the devices belong to root and vglusers group.

If you are the same to me, now you should add your user account into the vglusers group.

by usermod -a -G vglusers username (require sudo)

or editing /etc/group and add your username at the end of the line vglusers:x:****:user1,user2,...

Then logout the bash and re-enter.

Notes:

  • The group name may be video rather than vglusers; you'll see which name is used when you ll /dev/nvidia*.
  • If you are using vncserver, you have to kill the server and restart it because you did not have the permission when you started the vncserver.

Upvotes: 7

SEONGMOOK LIM
SEONGMOOK LIM

Reputation: 21

Have you ever installed VirtualGL? I had the same problem, but I installed VirtualGL and it was a problem. Run the VirtualGL installation file and select "Unconfigure server for use with VirtualGL". Then everything works normally.

IMPORTANT NOTE: Your system uses modprobe.d to set device permissions. You must execute rmmod nvidia with the display manager stopped in order for the new device permission settings to become effective.

Upvotes: 1

Related Questions