simona
simona

Reputation: 2181

How to prevent multiple executables from running at the same time on cluster

I have submitted a job to a multicore cluster with LSF platform. It looks like the code at the end. The two executables, exec1 and exec2, start at the same time. In my intention they are separated by a column comma and the second should start after the first has finished. Of course, this caused several problems with the job that couldn't terminate correctly. Now that I have figured out this behavior, I am writing separated job-submission files for each executable. Can anybody explain why these executables are running at the same time?

#!/bin/bash -l
#
# Batch script for bash users 
#
#BSUB -L /bin/bash
#BSUB -n 10
#BSUB -J jobname
#BSUB -oo output.log
#BSUB -eo error.log
#BSUB -q queue
#BSUB -P project
#BSUB -R "span[hosts=1]"
#BSUB -W 4:0

source /etc/profile.d/modules.sh
module purge
module load intel_comp/c4/2013.0.028
module load hdf5/1.8.9
module load platform_mpi/8.2.1

export OMP_NUM_THREADS=1
export MP_TASK_AFFINITY=core:$OMP_NUM_THREADS
OPT="-aff=automatic:latency"

mpirun $OPT exec1; mpirun $OPT exec2

Upvotes: 0

Views: 100

Answers (1)

zazzy78
zazzy78

Reputation: 172

I assume that both exec1 and exec2 are MPI applications?

Theoretically it should work, but LSF is probably doing something odd and the mpirun for exec1 is exiting before exec1 actually exits. You could instead try:

mpirun $OPT exec1 && mpirun $OPT exec2 
  • so that mpirun $OPT exec1 has to exit with return code 0 before exec2 is launched.

However, it probably isn't a great idea to run two MPI jobs from the same script like this, since for instance the MPI environment variable setup may introduce conflicts. What you should really do is use job chaining, so that exec2 is run after exec1, like this.

Upvotes: 1

Related Questions