Evan
Evan

Reputation: 477

Using --array and --nodelist in sbatch

Because of the limitations in how Matlab will utilize resources on a computing cluster, I want to create several jobs, each of which uses all of the cores on a given node. I can use the --array option in conjunction with other parameters to make sure that I get each job on a separate node. However, for some reason the slurm schedule on our cluster is putting my jobs on nodes which are already in use, even though I'm trying to max out the cores on a given node using the -c option:

#SBATCH --array=1-2
#SBATCH -t 24:00:00
#SBATCH -n 1
#SBATCH -c 20
#SBATCH -N 1
#SBATCH --exclusive
#SBATCH --mem-per-cpu 4000

module add ~/matlab/2014a

srun matlab -nodisplay -r "myfun($SLURM_ARRAY_TASK_ID);quit"

Using the --exclusive option doesn't seem to change anything. I've been having the same problem with single tasks as well, and my workaround has been to check which nodes aren't in use and request those specifically with the --nodelist option. Is there a way to use --array in conjunction with --nodelist so that each job and node in the list are matched in one-to-one correspondence? Right now SLURM is trying to use all the nodes for each job.

Upvotes: 1

Views: 2503

Answers (1)

damienfrancois
damienfrancois

Reputation: 59110

Three possibilities:

  1. Either the nodes have ghost jobs running outside of Slurm's control either because of ill-terminated previous jobs, or because of unfair cluster usage by other users. As Slurm does not check the load of nodes before allocating them, you can face the situation you are describing.

  2. Or, the Shared parameter of slurm.conf could be set to Force' to deny you the use of--exclusive` and hyperthreading could be enabled, leading Slurm to consider it has 40 cpus per node

  3. Or the Shared parameter of slurm.conf could be set to something else than Exclusive while the nodes are in two distinct partitions, a configuration that leads to node over-subscription.

Use the scontrol show config command to get more information about the configuration.

Upvotes: 1

Related Questions