razeh
razeh

Reputation: 2765

Slurm heterogeneous job groups are holding onto the entire allocation

I'm launching a heterogenous job group in slurm where my slurms script looks something like:

#!/bin/bash
srun --pack-group 0 short-process &
srun --pack-group 1 long-process &
wait

And my sbatch submission looks something like:

sbatch --mem-per-cpu=4g --ntasks=1 : --mem-per-cpu=2g --ntasks 1 mybash.sh

I'd hoped that when short-process was done it would release its allocation, but when I run sacct I see that both pack groups are listed as running, although the first run step for pack-group 0 is listed as COMPLETE.

Is there a way to get short-process to release the slurs allocation when it is finished?

Upvotes: 1

Views: 364

Answers (1)

damienfrancois
damienfrancois

Reputation: 59260

You have to do it explicitly with scancel $SLURM_JOB_ID+0 to cancel pack-group 0 for instance.

#!/bin/bash
{ srun --pack-group 0 short-process ; scancel $SLURM_JOB_ID+0 ; } &
srun --pack-group 1 long-process &
wait

Upvotes: 1

Related Questions