Peter
Peter

Reputation: 7

Submitting an array of jobs on SLURM

I am trying to submit an array of jobs on SLURM but the sleep command doesn't work as expected. I would like to launch a job every 10 seconds. However, this code waits 10 seconds to launch the whole array of jobs. How should I modify the following bash file?

#!/usr/bin/env bash
# The name to show in queue lists for this job:
#SBATCH -J matlab.sh

# Number of desired cpus:
#SBATCH --cpus=1
#SBATCH --mem=8gb

# The time the job will be running:
#SBATCH --time=167:00:00

# To use GPUs you have to request them:
##SBATCH --gres=gpu:1

# If you need nodes with special features uncomment the desired constraint line:
##SBATCH --constraint=bigmem
#SBATCH --constraint=cal
##SBATCH --constraint=slim

# Set output and error files
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out

# MAKE AN ARRAY JOB, SLURM_ARRAYID will take values from 1 to 100
#SARRAY --range=1-60

# To load some software (you can show the list with 'module avail'):
module load matlab

export from=400
export to=1000


export steps=60

mkdir  temp_${SLURM_ARRAYID}
cd temp_${SLURM_ARRAYID}
# the program to execute with its parameters:
matlab < ../SS.m  > output_temp_${SLURM_ARRAYID}.out
sleep 10

Upvotes: 0

Views: 3333

Answers (2)

markhahn
markhahn

Reputation: 551

You should almost certainly not have sleep in your job script. All it's doing is occupying the job's resources without getting any work done - waste.

Job arrays are just a submission shorthand: the members of the array have the same overhead as standalone jobs. The only difference is that the arrayjob sits in the queue sort of like a Python "generator", so every time the scheduler considers the queue and there are resources available, another array member will be budded off as a standalone job.

That's why the sleep makes no sense: it's in the job, not in the submission. Slurm doesn't have a syntax for throttling a job array by time (only by max running).

But why not just "for n in {1..60}; do sbatch script $n; sleep 10; done"?

I'm a cluster admin, and I'm find with this. You're trying to be kind to the scheduler, which is good. Every 10 seconds is overkill though - the scheduler can probably take a job per second without any sweat. I'd want you to think more thoroughly about whether the "shape" of each job makes sense (GPU jobs often can use more than one CPU core, and is this job efficient in the first place? and there are lots of ways to tune for cases where your program (matlab) can't keep a GPU busy, such as MPS and MIGs.)

Upvotes: 0

Gilles Gouaillardet
Gilles Gouaillardet

Reputation: 8395

From the documentation

A maximum number of simultaneously running tasks from the job array may be specified using a "%" separator. For example "--array=0-15%4" will limit the number of simultaneously running tasks from this job array to 4.

So if you want to submit a job array of 60 jobs, but run only one job at a time, updating your submission script like this should to the trick

#SBATCH --array=1-60%1

Upvotes: 1

Related Questions