user3423627
user3423627

Reputation: 51

Running multiple serially dependent jobs in parallel

I am running some CFD-simulations on a PBS based cluster. I will run a large number of cases, and therefore want to do the pre-processing on the cluster nodes. I need to do two steps, first meshing, and when the meshing is finished, I want to run the mesh partitioning routine. To avoid manual work i would like to program this in a pbs jobscript.

I can run the meshing of all cases in parallel by running the following:

#!/usr/bin/env bash
#PBS -q regular
#PBS -l nodes=1:ppn=8
#PBS -N prep_tst_2
#PBS -l walltime=6:00:00

cd $PBS_O_WORKDIR

hexp -batch -project tst_1.igg &
hexp -batch -project tst_2.igg &
hexp -batch -project tst_3.igg &
hexp -batch -project tst_4.igg &
hexp -batch -project tst_5.igg &
hexp -batch -project tst_6.igg &
hexp -batch -project tst_7.igg &
hexp -batch -project tst_8.igg &

#End of script

Where hexp is the meshing program!

I can also run a meshing task followed by the partitioning by running:

hexp -batch -project tst_1.igg ; partit -batch -project tst_1.igg

But how can I combine the two? I want to run 8 instances of the last command in paralell, so that as the meshing of tst_1.igg is finished it continues with partitioning of tst_1.igg regardless of the status of the other instances.

Best regards, Adam

Upvotes: 3

Views: 1091

Answers (2)

Steve Koch
Steve Koch

Reputation: 932

It looks like this problem would be handled well by GNU Parallel. If I understand correctly, you want to sequentially run hexp followed by partit for a given file. You want the sequence to run in parallel for a number of files. I think you would want to use GNU Parallel as follows:

First, create a simple bash script that accepts a filename argument and launches the two commands:

#!/bin/bash
hexp -batch -project $1 ; partit -batch -project $1

#name this file hexpart.sh and make it executable

Next, use GNU Parallel in your PBS script to launch hexpart.sh on multiple CPUS. In this case, eight files on 8 CPUs on one node:

#!/bin/bash
#PBS -l nodes=1:ppn=8
#Other PBS directives

cd $PBS_O_WORKDIR
module load gnu-parallel   # this will depend on your cluster setup

parallel -j8 --sshloginfile $PBS_NODEFILE --workdir $PBS_O_WORKDIR \
  `pwd`/hexpart.sh tst_{}.igg' ::: 1 2 3 4 5 6 7 8

#name this file launch.pbs

Then you run qsub launch.pbs, the parallel command will run hexpart.sh on the eight files, each on a separate CPU. The filenames will be generated by replacing the {} with the arguments after :::. Here is a tutorial for GNU Parallel.

Upvotes: 1

dbeer
dbeer

Reputation: 7213

What you are looking for are job dependencies. Let's say that your pre-processing command is placed into a script called preprocess.sh and the partitioning piece that you want to run 8 times is placed in a script called partition.sh

jobid=`qsub preprocess.sh`
for ((i=0; i < 8; i++)); do
  qsub partition.sh -W depend=afterok:$jobid
done

This makes the preprocess.sh script a job, and then submits 8 jobs that won't execute unless the first job exits with an exit code of zero. This will work nicely if you have the preprocess script output the results to a network file location that all compute nodes can read and you set up the partition.sh script to read from that same location.

You can read more about job dependencies in the documentation.

Upvotes: 0

Related Questions