Maciej
Maciej

Reputation: 135

Is it possible and how to get a list of cores on which my mpi job is running from slurm?

The question: Is it possible and if yes then how, to get the list of cores on which my mpi job is running at a given moment?

It is easy to list nodes to which the job has been assigned, but after few hours spent surveying the internet I start to suspect that slurm expose the cores list in any way (why wouldn't it tough?).

The thing is, i want to double check if the cluster i am working on is really spreading the processes of my job across nodes, cores (and if possible, sockets) as I ask it to do (call me paranoid if You will).

Please note that hwloc is not an answer to my question, i ask if it is possible to get this information from slurm, not from inside of my program (call me curious if You will).

Closely related to (but definitely not the same thing) other similar question

Upvotes: 0

Views: 809

Answers (1)

Gilles Gouaillardet
Gilles Gouaillardet

Reputation: 8395

well, that depends on your MPI library (MPICH-based, Open MPI-based, other), on how you run your MPI app (via mpirun or direct launch via srun) and your SLURM config.

if you direct launch, SLURM is the one that may do the binding. srun --cpu_bind=verbose ... should report how each task is bound.

if you mpirun, SLURM only spawns one proxy on each node. in the case of Open MPI, the spawn command is srun --cpu_bind=none orted ... so unless SLURM is configured to restrict the available cores (for example if you configured cpuset and nodes are not in exclusive mode), all the cores can be used by the MPI tasks. and then it is up to the MPI library to bind the MPI tasks within the available cores.

if you want to know what the available cores are, you can srun -N $SLURM_NNODES -n $SLURM_NNODES --cpu_bind=none grep Cpus_allowed_list /proc/self/status

if you want to know how the tasks are bound, you can mpirun grep Cpus_allowed_list /proc/self/status

or you can ask MPI to report that iirc, with Open MPI you can mpirun --report-bindings ...

Upvotes: 1

Related Questions