Reputation: 3255
I am trying to figure out what the concept of "tasks" means in SLURM. I have found this answer on SO that suggests me the following job script:
#!/bin/bash
#SBATCH --ntasks=2
srun --ntasks=1 sleep 10 &
srun --ntasks=1 sleep 12 &
wait
The author says that this job runs for him in 12 seconds in total, because the two steps sleep 10
and sleep 12
run in parallel but I cannot reproduce that.
If I save the above file as slurm-test
and run
sbatch -o slurm.out slurm-test,
I see that my job runs for 23 seconds.
This is the output of sacct --format=JobID,Start,End,Elapsed,NCPUS -S now-2minutes
:
JobID Start End Elapsed NCPUS
------------ ------------------- ------------------- ---------- ----------
645514 2021-06-30T11:05:38 2021-06-30T11:06:00 00:00:22 2
645514.batch 2021-06-30T11:05:38 2021-06-30T11:06:00 00:00:22 2
645514.exte+ 2021-06-30T11:05:38 2021-06-30T11:06:00 00:00:22 2
645514.0 2021-06-30T11:05:38 2021-06-30T11:05:48 00:00:10 2
645514.1 2021-06-30T11:05:48 2021-06-30T11:06:00 00:00:12 2
My slurm.out
output is
srun: Job 645514 step creation temporarily disabled, retrying (Requested nodes are busy)
srun: Step created for job 645514
Explicitly incuding -n 2
in the sbatch
call does not change the result. What am I doing wrong? How can I get the two calls in my job file to run simultaneously?
Upvotes: 5
Views: 4190
Reputation: 401
For me, the reason for step creation temporarily disabled, retrying (Requested nodes are busy)
is because, the srun
command that executed first, allocated all the memory. To solve this, one first optionally(?) specify the total memory allocation in sbatch
:
#SBATCH --ntasks=2
#SBATCH --mem=[XXXX]MB
And then specify the memory use per srun
task:
srun --exclusive --ntasks=1 --mem-per-cpu [XXXX/2]MB sleep 10 &
srun --exclusive --ntasks=1 --mem-per-cpu [XXXX/2]MB sleep 12 &
wait
I didn't specify cpu count for srun
because in my sbatch
script I have #SBATCH --cpus-per-task=1
. For the same reason I suspect you should use --mem
instead of --mem-per-cpu
in the srun
command when your job isn't serial, but I haven't tested this configuration.
Upvotes: 4
Reputation: 59340
Depending on the Slurm version you might have to add the --exclusive
parameter to srun (which has different semantics than for sbatch
):
#!/bin/bash
#SBATCH --ntasks=2
srun --ntasks=1 --exclusive -c 1 sleep 10 &
srun --ntasks=1 --exclusive -c 1 sleep 12 &
wait
Also adding -c 1
to be more explicit might help, again depending on the Slurm version.
Upvotes: 4