Reputation: 5154
Since CUDA 3.1 it is possible to limit the list of GPUs visible to applicaion by setting CUDA_VISIBLE_DEVICES
environment variable.
This affects both Runtime API and Driver API (to be sure I've checked it myself). It seems that device filtering is enforced somewher on driver level, and there is no way to ignore it.
However, I've encountered one closed source application which seems to somehow ignore this variable and always use device 0, even if we set CUDA_VISIBLE_DEVICES
to empty string, which means that appliction should not see any CUDA-capable device.
The application in question uses same CUDA libraries as dummy application for counting available devices:
$ ldd a.out # dummy
linux-vdso.so.1 => (0x00007fff7ec60000)
libcuda.so.1 => /usr/lib64/libcuda.so.1 (0x00007f606783a000)
libcudart.so.4 => /usr/local/cuda41/cuda/lib64/libcudart.so.4 (0x00007f60675e3000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f60672dd000)
libm.so.6 => /lib64/libm.so.6 (0x00007f606704e000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f6066e37000)
libc.so.6 => /lib64/libc.so.6 (0x00007f6066aa7000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f606688b000)
libz.so.1 => /lib64/libz.so.1 (0x00007f6066674000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f6066470000)
librt.so.1 => /lib64/librt.so.1 (0x00007f6066268000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6068232000)
$ ldd ../../bin/one.closed.source.application # application in question
linux-vdso.so.1 => (0x00007fffcf99c000)
libcufft.so.4 => /usr/local/cuda41/cuda/lib64/libcufft.so.4 (0x00007f06ce53a000)
libcuda.so.1 => /usr/lib64/libcuda.so.1 (0x00007f06cdb44000)
libcudart.so.4 => /usr/local/cuda41/cuda/lib64/libcudart.so.4 (0x00007f06cd8ed000)
libz.so.1 => /lib64/libz.so.1 (0x00007f06cd6cb000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f06cd4c7000)
librt.so.1 => /lib64/librt.so.1 (0x00007f06cd2bf000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f06ccfb8000)
libm.so.6 => /lib64/libm.so.6 (0x00007f06ccd34000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f06ccb1e000)
libc.so.6 => /lib64/libc.so.6 (0x00007f06cc78d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f06cc571000)
/lib64/ld-linux-x86-64.so.2 (0x00007f06d0110000)
I'm curious how is it possible to do this trick.
Upvotes: 3
Views: 3746
Reputation: 5154
Rubber duck debugging really works.
Turns out it is enough to use unsetenv
before calling cuInit
or cudaSetDevice
, and the initial value of environmetal variable will be ignored.
#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>
int main(int argc, char **argv, char **env) {
int x;
unsetenv("CUDA_VISIBLE_DEVICES");
cuInit(0);
// Now we see all the devices on machine
cuDeviceGetCount(&x);
printf("%d\n",x);
return 0;
}
Upvotes: 5