schmmd
schmmd

Reputation: 19468

How do I know which GPUs a job was allocated using SLURM?

I can run a job on slurm with, for example, srun --gpus=2 and it will set CUDA_VISIBLE_DEVICES to the GPUs allocated. However I know of no such way to inspect which GPUs SLURM allocated a particular job. If I run scontrol show job it will show me something like TresPerJob=gpu:2 but it doesn't contain the actual GPUs allocated.

Where can I find this information? In other words, how can I look up which GPUs job n was allocated?

Upvotes: 8

Views: 14784

Answers (3)

midiarsi
midiarsi

Reputation: 266

scontrol show job -d can do this. The -d flag adds extra info to the output, one of which is a field like GRES=gpu(IDX:0-2).

Upvotes: 14

Brendan
Brendan

Reputation: 56

If you're just looking for what slurm set CUDA_VISIBLE_DEVICES to, I'd suggest using cat /proc/12345/environ where the number is the PID of whatever slurm launched.

This is liable to be overridden, however, with something like srun --export=ALL bash -i, so you can't rely on it in the adversarial case.

Upvotes: 4

Bub Espinja
Bub Espinja

Reputation: 4571

When you execute nvidia-smi command, you get somethign like this:

enter image description here

The "GPU" column is the ID of the GPU which usually matches the device in the system (ls /dev/nvidia*). This same identification is used by Slurm in CUDA_VISIBLE_DEVICES environment variable. So, when in this variable you see

0,1,2

means that the job has been assigned with the GPUs whose IDs are 0, 1 and 2.

Upvotes: 2

Related Questions