Sky Scraper
Sky Scraper

Reputation: 301

How does SLURM array Interface with SBATCH resource allocation?

#!/bin/bash
#SBATCH -p RM-shared
#SBATCH -n 4
#SBATCH -t 24:00:00
#SBATCH --array=1-

I am trying to start an array and for each task in the array I would like it to use 4 cores on the RM-shared partition. Am I doing this correctly or does this designate that ALL of the tasks output by the array with have to share 4 cores?

I will ask a separate question for this, but for some reason when I run this, the $SLURM_ARRAY_TASK_ID variable is empty....

when I run

echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID

after my headers setting up the job, it returns

My SLURM_ARRAY_TASK_ID:

Upvotes: 0

Views: 485

Answers (2)

Sky Scraper
Sky Scraper

Reputation: 301

I wasnt calling the script properly. was calling: ./ThisScript.sh instead of sbatch ./ThisScript.sh

Regarding the allocation of cores per array job, a helpdesk person said to use #SBATCH --ntasks-per-node=4 instead of #SBATCH --cpus-per-task=4

but I do not know why... I would expect --ntasks-per-node=4 to command that each node needed only run 4 jobs, so if you had 12 jobs in your array it would require 3 full nodes.

--cpus-per-task=4 on the other hand, would command that each CPU (each with hosts a number of cores) would only run 4 tasks, so if you had 12 jobs in your array, it would requre 3 CPUs (and, if the nodes on your system have 3 or more CPUs, it would only require 1 node).

Upvotes: 0

damienfrancois
damienfrancois

Reputation: 59110

First you are right about using --cpus-per-task=4 rather than ntasks. Second, it could be a copy/paste error, but your --array line is incomplete

#SBATCH --array=1-

should be

#SBATCH --array=1-10

for instance for a 10-job array.

Each job in the array will have 4 distinct cores allocated to it. And the job will be scheduled independently, so they could for instance start all 10 on a 40-core nodes at the same time, or on 10 distinct nodes at the same time, or on one 4-core nodes one at a time, or any possible in-between combination depending on the cluster configuration and jobs in the queue.

Upvotes: 1

Related Questions