Reputation: 442
Goal:
Research:
Code snippet:
#!/bin/bash
#SBATCH --job-name LEBT
#SBATCH --partition=angel
#SBATCH --nodelist=node38
#SBATCH --sockets-per-node=1
#SBATCH --cores-per-socket=1
#SBATCH --time 00:10:00
#SBATCH --output LEBT.out
# the slurm module provides the srun command
module load openmpi
srun -n 1 ./LU.exe -i 100 -s 100 &
srun -n 1 ./BT.exe &
wait
Man Pages:
[srun]-->[https://computing.llnl.gov/tutorials/linux_clusters/man/srun.txt]
[mpirun]-->[https://www.open-mpi.org/doc/v1.8/man1/mpirun.1.php]
Upvotes: 4
Views: 5175
Reputation: 59180
Your script will work modulo a minor modification. If you don't care if your processes run on the same node or not, add #SBATCH --ntasks=2
#!/bin/bash
#SBATCH --job-name LEBT
#SBATCH --ntasks=2
#SBATCH --partition=angel
#SBATCH --nodelist=node38
#SBATCH --sockets-per-node=1
#SBATCH --cores-per-socket=1
#SBATCH --time 00:10:00
#SBATCH --output LEBT.out
# the slurm module provides the srun command
module load openmpi
srun -n 1 --exclusive ./LU.exe -i 100 -s 100 &
srun -n 1 --exclusive ./BT.exe &
wait
The --exclusive
argument to srun
is there to tell srun
to run with a subset of the whole allocation see the srun manpage.
If you want both processes to run on the sam node, use --cpus-per-task=2
#!/bin/bash
#SBATCH --job-name LEBT
#SBATCH --cpus-per-task=2
#SBATCH --partition=angel
#SBATCH --nodelist=node38
#SBATCH --sockets-per-node=1
#SBATCH --cores-per-socket=1
#SBATCH --time 00:10:00
#SBATCH --output LEBT.out
# the slurm module provides the srun command
module load openmpi
srun -c 1 --exclusive ./LU.exe -i 100 -s 100 &
srun -c 1 --exclusive ./BT.exe &
wait
Note that then, you must run srun
with -c 1
rather than with -n 1
.
Upvotes: 5
Reputation: 442
After extensive research, I have concluded that "srun" is the command you want to use to run jobs on parallel. Moreover, you need a helper script to be able to adequately execute the whole process. I have written the following script to execute the applications in one node with no problem.
#!/usr/bin/python
#SBATCH --job-name TPython
#SBATCH --output=ALL.out
#SBATCH --partition=magneto
#SBATCH --nodelist=node1
import threading
import os
addlock = threading.Lock()
class jobs_queue(threading.Thread):
def __init__(self,job):
threading.Thread.__init__(self,args=(addlock,))
self.job = job
def run(self):
self.job_executor(self.job)
def job_executor(self,cmd):
os.system(cmd)
if __name__ == __main__:
joblist = ["srun ./executable2",
"srun ./executable1 -i 20 -s 20"]
#creating a thread of jobs
threads = [jobs_queue(job) for job in joblist]
#starting jobs in the thread
[t.start() for t in threads]
#no interruptions
[t.join() for t in threads]
Both executables in my particular case with the particular flags activated yield around 55 seconds each. However, when they were ran on parallel, they both yield 59 seconds execution time.
Upvotes: 0