d_kennetz
d_kennetz

Reputation: 5359

slurm - sbatch job array for parallel execution of serial jobs filling each node using wrap command

I have a file with one command per line like:

myprog.sh <args abc>
myprog.sh <args def>

I am trying to submit these as jobs in an array. The array should use 1 host, and all the cpus on that host. Instead, each array is using 2 hosts, and executing both commands on both hosts.

What I'd like:

host1
  - myprog.sh <args abc> slurm-1_1.out
  - myprog.sh <args def> slurm-1_2.out

slurm-1_1.out would have the stdout from job 1, and slurm1_2.out would have the stdout from job2. Instead, what is happening is this:

host1
  - myprog.sh <args abc> slurm-1_1.out
  - myprog.sh <args def> slurm-1_1.out
host2
  - myprog.sh <args abc> slurm-1_2.out
  - myprog.sh <args def> slurm-1_2.out

So each job is being duplicated on each host.

My sbatch command looks like:

sbatch -p small --nodes=1 -w host1 --cpus-per-task=1 --mem-per-cpu=3G --array=1-2 --wrap="myprog.sh <args abc> & myprog.sh <args def> & wait"

Any help would be appreciated!

Upvotes: 0

Views: 795

Answers (1)

Marcus Boden
Marcus Boden

Reputation: 1695

Array jobs are for completely independent workloads. You want the jobs running concurrently on the same machine, so an array doesn't fit that case. You could go for a single job with two tasks though:

#SBATCH -p small
#SBATCH --nodes=1
#SBATCH --ntasks=2
#If you want access to all CPU and memory:
#SBATCH --exclusive

srun -n 1 myprog.sh <args abc> > first.out & 
srun -n 1 myprog.sh <args def> > second.out &
wait

Depending on your cluster setup, you might need the --exact flag and more memory specifications for srun. This changed a bit in recent versions. You should be able to do this in one line with a wrap command aswell, but as that is not really readable, I prefer to write the jobscript.

Upvotes: 1

Related Questions