Reputation: 5169
I want to see the memory the GPUs have before I submit my jobs. I manage to get slurm to tell me model:
(automl-meta-learning) [miranda9@golubh3 ~]$ sinfo -o %G -p eng-research
GRES
gpu:P100:4
(null)
gpu:V100:2
(automl-meta-learning) [miranda9@golubh3 ~]$ sinfo -o %G -p secondary
GRES
(null)
gpu:V100:2
gpu:V100:1
gpu:K80:4
gpu:TeslaK40M:2
but I want to see the amount of memory. I am aware I could login to the queue with srun
and see the resources by using nvidia-smi
BUT the queue is so fully it can take up to 16h to give me resources. How do I just tell slurm to tell me the GPU memory these queue GPUs have?
Upvotes: 2
Views: 3559
Reputation: 59250
Unless the system administrators have encoded the GPU memory as a node "feature", Slurm currently has no knowledge of the GPU memory. This could change in the future with the works on integrating NVIDIA Management Library (NVML) in Slurm, but until then, you can either ask the system administrators or look out in the documentation of your cluster, or in the specification sheets of the cards: V100 cars have either 16GB or 32GB of memory, K80 have 24GB, K40M have 12GB.
Upvotes: 3