Reputation: 91
How can I enable nvidia-smi for all users. I can run it as a sudo user, but as a regular user, I get:
Failed to initialize NVML: Insufficient Permissions
Upvotes: 9
Views: 11229
Reputation: 21
sudo vim /etc/nvidia-container-runtime/config.toml
edit #user = "root:video"
=> user = "root:vglusers"
, save
maybe run sudo nvidia-smi
and sudo reboot -h
You need to execute ll /dev/nvidia*
first to confirm the user:group of the nvidia device.
Then populate the group into the Dockerfile
echo "vglusers:x:1001:${USER} >> /etc/group
There is only one thing that is confusing at present is whether there is a variable with the content of 1001 in the system variables
Upvotes: 2
Reputation: 197
If your issue is happening when attempting to run the command from within a docker container AND you have SELinux activated (say using RHEL for example), then this might do the trick:
First check the SELinux context of your NVidia hardware:
$ ls -lZ /dev/nvidia*
crw-rw-rw-. 1 root root system_u:object_r:xserver_misc_device_t:s0 195, 0 Dec 17 09:44 /dev/nvidia0
crw-rw-rw-. 1 root root system_u:object_r:xserver_misc_device_t:s0 195, 255 Dec 17 09:44 /dev/nvidiactl
crw-rw-rw-. 1 root root system_u:object_r:xserver_misc_device_t:s0 243, 0 Dec 17 09:44 /dev/nvidia-fs0
crw-rw-rw-. 1 root root system_u:object_r:xserver_misc_device_t:s0 243, 1 Dec 17 09:44 /dev/nvidia-fs1
crw-rw-rw-. 1 root root system_u:object_r:xserver_misc_device_t:s0 243, 10 Dec 17 09:44 /dev/nvidia-fs10
...
As you can see the security context does not allow for containers to access the device, so change the security context as per the documentation above:
$ chcon -t container_file_t /dev/nvidia*
Now check again the security context:
$ ls -lZ /dev/nvidia*
crw-rw-rw-. 1 root root system_u:object_r:container_file_t:s0 195, 0 Dec 17 09:44 /dev/nvidia0
crw-rw-rw-. 1 root root system_u:object_r:container_file_t:s0 195, 255 Dec 17 09:44 /dev/nvidiactl
crw-rw-rw-. 1 root root system_u:object_r:container_file_t:s0 243, 0 Dec 17 09:44 /dev/nvidia-fs0
crw-rw-rw-. 1 root root system_u:object_r:container_file_t:s0 243, 1 Dec 17 09:44 /dev/nvidia-fs1
crw-rw-rw-. 1 root root system_u:object_r:container_file_t:s0 243, 10 Dec 17 09:44 /dev/nvidia-fs10
...
This should allow your containers to access the hardware (at least for us it did the trick).
Upvotes: 3
Reputation: 31
To follow the answer of @Zealseeker: Since my machine has several users I've changed the rights to rw-rw-rw- for following devices (I have two cards) :
ll /dev/nvidia
crw-rw-rw- 1 root vglusers 195, 0 Oct 13 19:34 /dev/nvidia0
crw-rw-rw- 1 root vglusers 195, 1 Oct 13 19:34 /dev/nvidia1
crw-rw-rw- 1 root vglusers 195, 255 Oct 13 19:34 /dev/nvidiactl
...
This idea was based on files at second machine, where there wasn't an update to driver nvidia-470.
Upvotes: 0
Reputation: 1
refer to nvidia-docker github issue
modify /etc/nvidia-container-runtime/config.toml
uncomment user = "root:video"
modify it to user = "root:root"
something else I did was that before I modify the config file, I delete the vglusers
group
Upvotes: 0
Reputation: 823
I had the problem and here was my solution. Maybe it is helpful to you.
By ll /dev/nvidia*
you can find that the devices belong to root
and vglusers
group.
If you are the same to me, now you should add your user account into the vglusers
group.
by usermod -a -G vglusers username
(require sudo)
or editing /etc/group
and add your username at the end of the line vglusers:x:****:user1,user2,...
Then logout the bash and re-enter.
Notes:
video
rather than vglusers
; you'll see which name is used when you ll /dev/nvidia*
.vncserver
, you have to kill the server and restart it because you did not have the permission when you started the vncserver
.Upvotes: 7
Reputation: 21
Have you ever installed VirtualGL? I had the same problem, but I installed VirtualGL and it was a problem. Run the VirtualGL installation file and select "Unconfigure server for use with VirtualGL". Then everything works normally.
IMPORTANT NOTE: Your system uses modprobe.d to set device permissions. You must execute rmmod nvidia with the display manager stopped in order for the new device permission settings to become effective.
Upvotes: 1