MattM
MattM

Reputation: 337

SLURM does not ensure all jobs finish in a bash script despite remaining wall time

I'm executing several python scripts using SLURM in the following format:

General SLURM header (partition, wall-time, etc.)

python script.py scenario_1 & 
python script.py scenario_2 & 
python script.py scenario_3 &
python script.py scenario_4

I'm discovering that the order I specify these jobs to be completed matters. If scenario_4 (or, more generally, the last job) finishes before other jobs, the remaining jobs will not complete. I am able to simply organize my jobs by duration, although in many cases I do not know the relative compute time which makes this estimation imperfect. Is there a way to ensure that SLURM doesn't prematurely kill jobs?

Upvotes: 0

Views: 483

Answers (1)

damienfrancois
damienfrancois

Reputation: 59110

Written like this, this submission script will ensure only termination of scenario_4.

Slurm will consider a job to be finished when the submission script is finished; and the submission script will consider itself done whenever all foreground jobs are done.

Add the wait command at the end, like this:

python script.py scenario_1 & 
python script.py scenario_2 & 
python script.py scenario_3 &
python script.py scenario_4 &
wait

The wait command will fore the submission script to wait for all background jobs to be done before considering itself finished.

Upvotes: 3

Related Questions