efficiency in limiting the number of parallel jobs in slurm

Question

My question is based on THIS question.

I should consider using --array=0-60000%200 to limit the number of jobs running in parallel to 200 in slurm. It seems to me that it takes up to a minute to lunch a new job every time that an old job is finished. Given the number of jobs that I am planning to run, I might be wasting a lot of time this way.

I wrote a "most probably" very inefficient alternative, consisting in a script that launches the jobs, checking the number of jobs in the queue and adding jobs if I am still bellow the max number of jobs allowed and while I reached the max number of parallel jobs, sleep for 5 seconds, as follows:

#!/bin/bash

# iterate procedure $1 times.  $1=60000
for ((i=0;i<=$1;i++))
do
    # wait until any queued process is finished
    q=$(squeue -u myuserName | wc -l) #I don't care about +/-1 lines (e.g. title)
    while [ $q -gt 200 ] #max number of parallel jobs set to 200
    do
        sleep 5
        q=$(squeue -u myuserName | wc -l)
    done
    # run the job with sbatch
    sbatch...  
done

It seems to do a better job compared to my previous method, nevertheless, I would like to know how inefficient is in reality this implementation? and why? Could I be harming the scheduling efficiency of other users on the same cluster?

Thank you.

Poshi · Accepted Answer

SLURM needs some time to process the jobs list and decide which job should be the next to run, specially if the backfill scheduler is in place and there are lots of jobs in the queue. You are not losing one minute to schedule a job due to you using a job array, is SLURM that needs one minute to decide, and it will need the same minute for any other job of any other user, with or without job arrays.

By using your approach your jobs are also losing priority: everytime one of your jobs finishes, you launch a new one, and that new job will be the last in the queue. Also, SLURM will have to manage some hundreds of independent jobs instead of only one that accounts for the 60000 that you need.

If you are alone in the cluster, maybe there's no big difference in both approaches, but if your cluster is full, you manual approach will give a slightly higher load to SLURM and you jobs will finish quite a lot later compared to the job array approximation (just because with the job array, once the array gets to be first in line, the 60000 are first in line, compared to being last in line everytime one of your jobs finishes).

efficiency in limiting the number of parallel jobs in slurm

Answers (1)

Related Questions