itsmrbeltre
itsmrbeltre

Reputation: 442

SLURM: How can I run different executables on parallel on the same compute node or in different nodes?

Goal:

  1. learn how to run or co-schedule or execute executables/applications with a sbatch job submission
  2. using either srun or mpirun

Research:

Code snippet:

 #!/bin/bash
 #SBATCH --job-name LEBT 
 #SBATCH --partition=angel
 #SBATCH --nodelist=node38
 #SBATCH --sockets-per-node=1
 #SBATCH --cores-per-socket=1
 #SBATCH --time 00:10:00 
 #SBATCH --output LEBT.out

 # the slurm module provides the srun command
 module load openmpi


 srun  -n 1   ./LU.exe -i 100 -s 100  &
 srun  -n 1   ./BT.exe  &

 wait 

Man Pages:

 [srun]-->[https://computing.llnl.gov/tutorials/linux_clusters/man/srun.txt]

 [mpirun]-->[https://www.open-mpi.org/doc/v1.8/man1/mpirun.1.php]

Upvotes: 4

Views: 5175

Answers (2)

damienfrancois
damienfrancois

Reputation: 59180

Your script will work modulo a minor modification. If you don't care if your processes run on the same node or not, add #SBATCH --ntasks=2

#!/bin/bash
#SBATCH --job-name LEBT 
#SBATCH --ntasks=2
#SBATCH --partition=angel
#SBATCH --nodelist=node38
#SBATCH --sockets-per-node=1
#SBATCH --cores-per-socket=1
#SBATCH --time 00:10:00 
#SBATCH --output LEBT.out

# the slurm module provides the srun command
module load openmpi

srun  -n 1 --exclusive  ./LU.exe -i 100 -s 100  &
srun  -n 1 --exclusive  ./BT.exe  &

wait 

The --exclusive argument to srun is there to tell srun to run with a subset of the whole allocation see the srun manpage.

If you want both processes to run on the sam node, use --cpus-per-task=2

#!/bin/bash
#SBATCH --job-name LEBT 
#SBATCH --cpus-per-task=2
#SBATCH --partition=angel
#SBATCH --nodelist=node38
#SBATCH --sockets-per-node=1
#SBATCH --cores-per-socket=1
#SBATCH --time 00:10:00 
#SBATCH --output LEBT.out

# the slurm module provides the srun command
module load openmpi

srun  -c 1 --exclusive  ./LU.exe -i 100 -s 100  &
srun  -c 1 --exclusive  ./BT.exe  &

wait 

Note that then, you must run srun with -c 1 rather than with -n 1.

Upvotes: 5

itsmrbeltre
itsmrbeltre

Reputation: 442

After extensive research, I have concluded that "srun" is the command you want to use to run jobs on parallel. Moreover, you need a helper script to be able to adequately execute the whole process. I have written the following script to execute the applications in one node with no problem.

#!/usr/bin/python
#SBATCH --job-name TPython
#SBATCH --output=ALL.out
#SBATCH --partition=magneto
#SBATCH --nodelist=node1


import threading
import os

addlock = threading.Lock()

class jobs_queue(threading.Thread):
    def __init__(self,job):
            threading.Thread.__init__(self,args=(addlock,))
            self.job = job
    def run(self):
            self.job_executor(self.job)

    def job_executor(self,cmd):
            os.system(cmd)

if __name__ == __main__:

    joblist =  ["srun  ./executable2",
                "srun  ./executable1 -i 20 -s 20"]

    #creating a thread of jobs 
    threads = [jobs_queue(job)  for job in joblist]

    #starting jobs in the thread 
    [t.start() for t in threads]

    #no interruptions 
    [t.join()  for t in threads]

Both executables in my particular case with the particular flags activated yield around 55 seconds each. However, when they were ran on parallel, they both yield 59 seconds execution time.

Upvotes: 0

Related Questions