aland
aland

Reputation: 5154

Ignoring `CUDA_VISIBLE_DEVICES` environment variable

Since CUDA 3.1 it is possible to limit the list of GPUs visible to applicaion by setting CUDA_VISIBLE_DEVICES environment variable.

This affects both Runtime API and Driver API (to be sure I've checked it myself). It seems that device filtering is enforced somewher on driver level, and there is no way to ignore it.

However, I've encountered one closed source application which seems to somehow ignore this variable and always use device 0, even if we set CUDA_VISIBLE_DEVICES to empty string, which means that appliction should not see any CUDA-capable device.

The application in question uses same CUDA libraries as dummy application for counting available devices:

$ ldd a.out  # dummy
    linux-vdso.so.1 =>  (0x00007fff7ec60000)
    libcuda.so.1 => /usr/lib64/libcuda.so.1 (0x00007f606783a000)
    libcudart.so.4 => /usr/local/cuda41/cuda/lib64/libcudart.so.4 (0x00007f60675e3000)
    libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f60672dd000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f606704e000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f6066e37000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f6066aa7000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f606688b000)
    libz.so.1 => /lib64/libz.so.1 (0x00007f6066674000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f6066470000)
    librt.so.1 => /lib64/librt.so.1 (0x00007f6066268000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f6068232000)


$ ldd ../../bin/one.closed.source.application # application in question
    linux-vdso.so.1 =>  (0x00007fffcf99c000)
    libcufft.so.4 => /usr/local/cuda41/cuda/lib64/libcufft.so.4 (0x00007f06ce53a000)
    libcuda.so.1 => /usr/lib64/libcuda.so.1 (0x00007f06cdb44000)
    libcudart.so.4 => /usr/local/cuda41/cuda/lib64/libcudart.so.4 (0x00007f06cd8ed000)
    libz.so.1 => /lib64/libz.so.1 (0x00007f06cd6cb000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f06cd4c7000)
    librt.so.1 => /lib64/librt.so.1 (0x00007f06cd2bf000)
    libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f06ccfb8000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f06ccd34000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f06ccb1e000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f06cc78d000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f06cc571000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f06d0110000)

I'm curious how is it possible to do this trick.

Upvotes: 3

Views: 3746

Answers (1)

aland
aland

Reputation: 5154

Rubber duck debugging really works.

Turns out it is enough to use unsetenv before calling cuInit or cudaSetDevice, and the initial value of environmetal variable will be ignored.

#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>

int main(int argc, char **argv, char **env) {
  int x;
  unsetenv("CUDA_VISIBLE_DEVICES");
  cuInit(0);
  // Now we see all the devices on machine
  cuDeviceGetCount(&x);
  printf("%d\n",x);
  return 0;
}

Upvotes: 5

Related Questions