Faber
Faber

Reputation: 1562

How to submit a job to any [subset] of nodes from nodelist in SLURM?

I have a couple of thousand jobs to run on a SLURM cluster with 16 nodes. These jobs should run only on a subset of the available nodes of size 7. Some of the tasks are parallelized, hence use all the CPU power of a single node while others are single threaded. Therefore, multiple jobs should run at the same time on a single node. None of the tasks should spawn over multiple nodes.

Currently I submit each of the jobs as follow:

sbatch --nodelist=myCluster[10-16] myScript.sh

However this parameter makes slurm to wait till the submitted job terminates, and hence leaves 3 nodes completely unused and, depending on the task (multi- or single-threaded), also the currently active node might be under low load in terms of CPU capability.

What are the best parameters of sbatch that force slurm to run multiple jobs at the same time on the specified nodes?

Upvotes: 42

Views: 69808

Answers (3)

damienfrancois
damienfrancois

Reputation: 59330

You can work the other way around; rather than specifying which nodes to use, with the effect that each job is allocated all the 7 nodes, specify which nodes not to use:

sbatch --exclude=myCluster[01-09] myScript.sh

and Slurm will never allocate more than 7 nodes to your jobs. Make sure though that the cluster configuration allows node sharing, and that your myScript.sh contains #SBATCH --ntasks=1 --cpu-per-task=n with n the number of threads of each job.

Update: since version 23.02, the --nodelist may contain more nodes than specified by --nodes. From the changelog:

-- Allow for --nodelist to contain more nodes than required by --nodes.

Upvotes: 62

Faber
Faber

Reputation: 1562

Actually I think the way to go is setting up a 'reservation' first. According to this presentation http://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf (last slide).

Scenario: Reserve ten nodes in the default SLURM partition starting at noon and with a duration of 60 minutes occurring daily. The reservation will be available only to users alan and brenda.

scontrol create reservation user=alan,brenda starttime=noon duration=60 flags=daily nodecnt=10
Reservation created: alan_6

scontrol show res
ReservationName=alan_6 StartTime=2009-02-05T12:00:00
    EndTime=2009-02-05T13:00:00 Duration=60 Nodes=sun[000-003,007,010-013,017] NodeCnt=10 Features=(null) PartitionName=pdebug Flags=DAILY Licenses=(null)
    Users=alan,brenda Accounts=(null)

# submit job with:
sbatch --reservation=alan_6 myScript.sh

Unfortunately I couldn't test this procedure, probaly due to a lack of privileges.

Upvotes: 0

Riccardo Murri
Riccardo Murri

Reputation: 1065

Some of the tasks are parallelized, hence use all the CPU power of a single node while others are single threaded.

I understand that you want the single-threaded jobs to share a node, whereas the parallel ones should be assigned a whole node exclusively?

multiple jobs should run at the same time on a single node.

As far as my understanding of SLURM goes, this implies that you must define CPU cores as consumable resources (i.e., SelectType=select/cons_res and SelectTypeParameters=CR_Core in slurm.conf)

Then, to constrain parallel jobs to get a whole node you can either use --exclusive option (but note that partition configuration takes precedence: you can't have shared nodes if the partition is configured for exclusive access), or use -N 1 --tasks-per-node="number_of_cores_in_a_node" (e.g., -N 1 --ntasks-per-node=8).

Note that the latter will only work if all nodes have the same number of cores.

None of the tasks should spawn over multiple nodes.

This should be guaranteed by -N 1.

Upvotes: 3

Related Questions