user3240688
user3240688

Reputation: 1327

Slurm - How to use all available CPUs for independent tasks?

My question is similar to this question

Make use of all CPUs on SLURM

Long story short, I want to use all available CPU cores, over as many nodes as possible.

The difference is that instead of a single job that's an MPI program, my job consists of N independent tasks, of 1 core per task. N could potentially be greater than the total number of available cores, in which case some tasks would just need to wait.

For example, say I have a cluster of 32 cores. And say I'd like to run the same program (worker_script.sh), 100 times, each with different input. Each call to worker_script.sh is a task. I would like the first 32 tasks to run, while the remaining 68 tasks would be queued. When cores free up, the later tasks would run. Eventually, my job is considered finished when all tasks are done running.

What is the proper way to do that? I did the following script, and I invoked it with sbatch. But it just runs everything on the same core. So it ended up taking forever.

#!/bin/bash
ctr=0
while [[ $ctr -lt 100 ]]; do 
   srun worker_script.sh $ctr &
   ((ctr++))
done

wait

Alternatively, I could invoke the above script directly. That seemed to do the trick. As in, it took over all 32 cores, while queued up everything else. When cores freed up, they would then get allocated to the remaining calls to worker_script.sh. Eventually, all 100 jobs finished, all out of order of course, as expected.

The difference is that instead of 1 job of 100 tasks, it was 100 jobs of 1 task each.

Is there a reason I can't do 100 independent tasks? Am I fundamentally wrong to begin with? Should I be doing 100 jobs instead of 100 tasks?

Upvotes: 2

Views: 2993

Answers (2)

Marcus Boden
Marcus Boden

Reputation: 1685

If you submit that script via sbatch, it will allocate a single task to the job. And inside of the job, the srun command is limited to the ressources of the job. This is why your calculations run sequentially, when you submit it via sbatch.

If you just run the script, without sbatch, the call to srun will create a new job everytime (as you already noticed) and therefore it is not limited to a single task.

Is there a reason I can't do 100 independent tasks? Am I fundamentally wrong to begin with? Should I be doing 100 jobs instead of 100 tasks?

In the end, it is a bit of personal preference which way you prefer. You can have a single job with 100 tasks:

#!/bin/bash
#SBATCH -n 32
ctr=0
while [[ $ctr -lt 100 ]]; do 
   srun -n 1 worker_script.sh $ctr &
   ((ctr++))
done

wait

This will allocate 32 tasks and each srun call will consume 1 task, the rest should be. Disadvantage: You will need to wait for 32 tasks to be free at once. Meaning that you likely wait longer in the queue.

A better way (in my opinion) is to use a job array:

#!/bin/bash
#SBATCH -a 0-99%32
worker_script.sh $SLURM_ARRAY_TASK_ID

This creates a job array with 100 jobs. 32 of them can run simultaneously. If you don't need/want the latter, you can just remove the %32 part from the #SBATCH parameter. Why is this better? If your tasks are completely independent, there's no real need to have them all in one job. And this way, a task can run as soon as there is a slot free anywhere. This should reduce the time in queue to a minimum.

Additionally, using job arrays is elegant and puts less load on the scheduler. Your admins will likely prefer having a large job array over numerous identical jobs submitted in a for-loop.

Upvotes: 2

Maarten-vd-Sande
Maarten-vd-Sande

Reputation: 3701

Take a look at sbatch instead of srun, see here for docs.

#!/bin/bash
ctr=0
while [[ $ctr -lt 100 ]]; do 
   sbatch worker_script.sh $ctr -n 1 & ((ctr++))
done

srun is so-called interactive/blocking, but sbatch submits the job to the cluster and outputs the stdout/stderr to a file.

Upvotes: 0

Related Questions