user1447257
user1447257

Reputation:

Limit the number of running jobs in SLURM

I am queuing multiple jobs in SLURM. Can I limit the number of parallel running jobs in slurm?

Thanks in advance!

Upvotes: 18

Views: 21523

Answers (6)

geneticatt
geneticatt

Reputation: 1

This can also be accomplished by using a while loop to monitor the user's job queue. This counts the squeue header too, so this will prevent job from being submitted if there are already 6 running or in queue.

while [ $( squeue -u $(whoami) | wc -l) -gt 6 ]; do 
    sleep 10; done; sbatch script.sh

Upvotes: 0

Reniel Calzada
Reniel Calzada

Reputation: 124

Expanding on the accepted answer, in my case, I needed to run a maximum number of jobs per node, and I needed to do it exclusively using srun (not sbatch). The way I resolved this problem was to use use these three flags together: --nodename=<nodename> --dependency=singleton --job-name=<uniquename>_<nodename>.

First I create an array with x unique names, where the length of that array is the maximum number of jobs I want to run per node. Second, I create an array with all the node names I want to use. Finally I combine these two arrays in a cyclic fashion, that is, I append the node name to the unique name, and I make sure that the value for --nodename matches the values of the appended nodename. The result is that of limiting the maximum number of jobs that run on each node, rather than to limit the max number of jobs. In my case I needed to distribute it this way mainly due to memory constraints on each node.

Upvotes: 0

damienfrancois
damienfrancois

Reputation: 59340

If you are not the administrator, your can hold some jobs if you do not want them all to start at the same time, with scontrol hold <JOBID>, and you can delay the submission of some jobs with sbatch --begin=YYYY-MM-DD.

Also, if it is a job array, you can limit the number of jobs in the array that are concurrently running with for instance --array=1:100%25 to have 100 jobs in the array but only 25 of them running.

Finally, you can use the --dependency=singleton option that will only allow one of a set of jobs with the same --job-name to be running at a time. If you choose three names and distribute those names to all your jobs and use that option, you are effectively restricting yourself to 3 running jobs max.

Upvotes: 22

lonestar21
lonestar21

Reputation: 1193

If your jobs are relatively similar you can use the slurm array functions. I had been trying to figure this out for a while and found this solution at https://docs.id.unibe.ch/ubelix/job-management-with-slurm/array-jobs-with-slurm

#!/bin/bash -x
#SBATCH --mail-type=NONE
#SBATCH --array=1-419%25  # Submit 419 tasks with with only 25 of them running at any time

#contains the list of 419 commands I want to run
cmd_file=s1List_170519.txt

cmd_line=$(cat $cmd_file | awk -v var=${SLURM_ARRAY_TASK_ID} 'NR==var {print $1}')    # Get first argument

$cmd_line  #may need to be piped to bash

Upvotes: 1

aerijman
aerijman

Reputation: 2782

According to SLURM documentation, --array=0-15%4 (- sign and not :) will limit the number of simultaneously running tasks from this job array to 4

I wrote test.sbatch:

#!/bin/bash
# test.sbatch
#
#SBATCH -J a
#SBATCH -p campus
#SBATCH -c 1
#SBATCH -o %A_%a.output

mkdir test${SLURM_ARRAY_TASK_ID}

# sleep for up to 10 minutes to see them running in squeue and 
# different times to check that the number of parallel jobs remain constant
RANGE=600; number=$RANDOM; let "number %= $RANGE"; echo "$number"

sleep $number

and run it with sbatch --array=1-15%4 test.sbatch

Jobs run as expected (always 4 in parallel) and just create directories and kept running for $number seconds.

Appreciate comments and suggestions.

Upvotes: 8

AndresM
AndresM

Reputation: 1373

According to the SLURM Resource Limits documentation, you can limit the total number of jobs that you can run for an association/qos with the MaxJobs parameter. As a reminder, an association is a combination of cluster, account, user name and (optional) partition name.

You should be able to do something similar to:

sacctmgr modify user <userid> account=<account_name> set MaxJobs=10

I found this presentation to be very helpful in case you have more questions.

Upvotes: 13

Related Questions