Reputation: 19468
I can run a job on slurm with, for example, srun --gpus=2
and it will set CUDA_VISIBLE_DEVICES
to the GPUs allocated. However I know of no such way to inspect which GPUs SLURM allocated a particular job. If I run scontrol show job
it will show me something like TresPerJob=gpu:2
but it doesn't contain the actual GPUs allocated.
Where can I find this information? In other words, how can I look up which GPUs job n was allocated?
Upvotes: 8
Views: 14784
Reputation: 266
scontrol show job -d
can do this. The -d
flag adds extra info to the output, one of which is a field like GRES=gpu(IDX:0-2)
.
Upvotes: 14
Reputation: 56
If you're just looking for what slurm set CUDA_VISIBLE_DEVICES
to, I'd suggest using cat /proc/12345/environ
where the number is the PID of whatever slurm launched.
This is liable to be overridden, however, with something like srun --export=ALL bash -i
, so you can't rely on it in the adversarial case.
Upvotes: 4
Reputation: 4571
When you execute nvidia-smi
command, you get somethign like this:
The "GPU" column is the ID of the GPU which usually matches the device in the system (ls /dev/nvidia*
). This same identification is used by Slurm in CUDA_VISIBLE_DEVICES
environment variable.
So, when in this variable you see
0,1,2
means that the job has been assigned with the GPUs whose IDs are 0, 1 and 2.
Upvotes: 2