Reputation: 385
I am scheduling jobs on a cluster to take up either 1 or 2 gpus of some nodes. I frequently use sinfo -p gpu
to list all nodes of the 'gpu' partition as well as their state. Some appear with the state 'idle', indicating that there is no job running on them. Some however appear with state 'mix', meaning that there is some job running on them.
However, there is no information given how many GPUs on a mixed-State node are actually taken. Is there any, possibly sinfo based command, to let me know the number of free gpus on the server, possibly per node?
The sinfo manual did not gave any insights expect using the output option "%G" which just uses the number of gpus available in general. Thanks!
Update: I realized that I can use "%C" to print out the allocated/idle use of CPUs per node with the following command:
--format="%9P %l %10n %.14C %.10T "
I want to do the exact same thing but with GPUs instead of CPUs.
Upvotes: 3
Views: 2784
Reputation: 59250
Unfortunately sinfo
does not provide the information right away. You will have to parse the output of scontrol
:
scontrol -o show node | grep -Po "AllocTRES[^ ]*(?<=gpu=)\K[0-9]+" | paste -d + -s | bc
This lists all nodes, extracts the part that corresponds to AllocTRES
(allocated trackable resources, which GPUs are part of), and in that part, more specifically the part that concerns the GPUs. It then uses paste
and bc
to compute the sum (you could be using awk
instead if you prefer).
If you replace Alloc
with Cfg
in the one-liner, you will have the total number of GPUs configured.
Upvotes: 2