Reputation: 155
I am new to SLURM. My problem is that I have a multi-stage job, which needs to be run on a cluster, whose jobs are managed by SLURM. Specifically I want to schedule a job which:
Each step could be run using a separate bash script; while the execution of the scripts and transitions between stages are coordinated by a master node.
My problem is that I know how to allocate nodes and call a single command or script on each (which runs as a stand-alone job on each node) using SLURM. But as soon as the command is done (or the called script is finished) on each node, the node returns to pool of free resources, leaving the allocated nodes queue for my job. But the above use case involves several stages/scripts; and needs coordination between them.
I am wondering what the correct way is to design/run a set of scripts for such a use case, using SLURM. Any suggestion or example would be extremely helpful, and highly appreciated.
Upvotes: 0
Views: 530
Reputation: 59072
You simply need to encapsulate all your scripts into a single one for submission:
#!/bin/bash
#SBATCH --nodes=4 --exclusive
# Setting Bash to exit whenever a command exits with a non-zero status.
set -e
set -o pipefail
echo "Installing software on each of $SLURM_NODELIST"
srun ./install.sh
echo "Creating database instance"
./createDBInstance.sh $SLURM_NODELIST
echo "Loading DB"
./loadDB.sh params
echo Benchmarking
./benchmarks.sh params
echo Done.
You'll need to fill in the blanks... Make sure that your script follow the standard of exiting with a non-zero status on error.
Upvotes: 2