Batch script for multi-partition job?

Question

I'm working on a project which runs programs on two different partitions of a large compute cluster. I'd like to run this using a batch script, but after searching, it's still unclear if/how I can allocate and run programs on two different partitions from within a single batch script. Here's the sort of thing I'd like to do

#!/bin/bash
#SBATCH --partition=
#SBATCH --ntasks=<100 on batch, 1 on gpu>
#SBATCH --mem-per-cpu=2G
#SBATCH --time=4-00:00:00
#SBATCH --exclude=nodeynode[003,016,019,020-023,026-030,004-015,017-018,020,024,031]
#SBATCH --job-name="lorem_ipsum"

filenames=("name1" "name2" "name3")

srun -p gpu python gpu_init.py
wait

for i in {0..100}
do
    for name in "${filenames[@]}"
    do
    srun -p batch pythonexecutable &
    done
srun -p gpu python gpu_iter.py
wait
done

Apologies for bash errors, I usually script in python but I can't here as I'm switching python modules (different versions) within my bash script (not shown). I saw that you can actually put a list of partitions in the header of a batch script, but from what I read that actually just tells the scheduler to allocate any available partitions from within the list, not multiple partitions.

Thanks!

damienfrancois · Accepted Answer

Slurm jobs are restricted to one partition so in your case, there are several courses of action:

submitting two job arrays --array=1..100 and splitting your submission script in one part for the batch partition and another part for the gpu partition and linking both arrays with --depedendcy=aftercorr:
use salloc to create an allocation on the gpu partition, and then use SSH explicitly to that node to run python gpu_iter.py in the submission script (if the cluster configuration permits)
modify the gpu_iter.py so that it can be signaled (with UNIX signals) that it has to run and then sleep until the next signal, and use scancel to signal the gpu job from within the batch job at each iteration.

Update: according to this ticket, this can be done now with heterogeneous jobs.

Batch script for multi-partition job?

Answers (1)

Related Questions