Charlie Parker
Charlie Parker

Reputation: 5169

How do I see the memory of the GPUs I have available in a slurm partition/queue?

I want to see the memory the GPUs have before I submit my jobs. I manage to get slurm to tell me model:

(automl-meta-learning) [miranda9@golubh3 ~]$ sinfo -o %G -p eng-research
GRES
gpu:P100:4
(null)
gpu:V100:2
(automl-meta-learning) [miranda9@golubh3 ~]$ sinfo -o %G -p secondary   
GRES
(null)
gpu:V100:2
gpu:V100:1
gpu:K80:4
gpu:TeslaK40M:2

but I want to see the amount of memory. I am aware I could login to the queue with srun and see the resources by using nvidia-smi BUT the queue is so fully it can take up to 16h to give me resources. How do I just tell slurm to tell me the GPU memory these queue GPUs have?

Upvotes: 2

Views: 3559

Answers (1)

damienfrancois
damienfrancois

Reputation: 59250

Unless the system administrators have encoded the GPU memory as a node "feature", Slurm currently has no knowledge of the GPU memory. This could change in the future with the works on integrating NVIDIA Management Library (NVML) in Slurm, but until then, you can either ask the system administrators or look out in the documentation of your cluster, or in the specification sheets of the cards: V100 cars have either 16GB or 32GB of memory, K80 have 24GB, K40M have 12GB.

Upvotes: 3

Related Questions