Reputation: 121
I am trying to run more than 1 MPI codes (eg. 2) in PBS queue system across multiple nodes as a single job.
E.g. For my cluster, 1 node = 12 procs
I need to run 2 codes (abc1.out & abc2.out) as a single job, each code using 24 procs. Hence, I need 4x12 cores for this job. And I need a software which can assign 2x12 to each of the code.
Someone suggested:
How to run several commands in one PBS job submission
which is:
(cd jobdir1; myexecutable argument1 argument2) &
(cd jobdir2; myexecutable argument1 argument2) &
wait
but it doesn't work. The codes are not distributed among all processes.
Can GNU parallel be used? Becos I read somewhere that it can't work across multiple nodes.
If so, what's the command line for the PBS queue system
If not, is there any software which can do this?
This is similar to my final objective which is similar but much more complicated.
Thanks for the help.
Upvotes: 2
Views: 439
Reputation: 121
thanks for the suggestions.
Btw, i tried using gnu parallel and so far, it only works for jobs within a single node. After some trial and error, I finally found the solution.
Suppose each node has 12procs. And you need to run 2 jobs, each req 24 procs.
So u can request:
#PBS -l select=4:ncpus=12:mpiprocs=12:mem=32gb:ompthreads=1
Then
sort -u $PBS_NODEFILE > unique-nodelist.txt
sed -n '1,2p' unique-nodelist.txt > host.txt
sed 's/.*/& slots=12/' host.txt > host1.txt
sed -n '3,4p' unique-nodelist.txt > host.txt
sed 's/.*/& slots=12/' host.txt > host2.txt
mv host1.txt 1/
mv host2.txt 2/
(cd 1; ./run_solver.sh) &
(cd 2; ./run_solver.sh) &
wait
What the above do is to get the nodes used, remove repetition
separate into 2 nodes each for each job
go to dir 1 and 2 and run the job using run_solver.sh
Inside run_solver.sh for job 1 in dir 1:
...
mpirun -n 24 --hostfile host1.txt abc
Inside run_solver.sh for job 2 in dir 2:
...
mpirun -n 24 --hostfile host2.txt def
Note the different host name.
Upvotes: 0
Reputation: 33685
Looking at https://hpcc.umd.edu/hpcc/help/running.html#mpi it seems you need to use $PBS_NODEFILE
.
Let us assume you have $PBS_NODEFILE
containing the 4 reserved nodes. You then need a way to split these in 2x2. This will probably do:
run_one_set() {
cat > nodefile.$$
mpdboot -n 2 -f nodefile.$$
mpiexec -n 1 YOUR_PROGRAM
mpdallexit
rm nodefile.$$
}
export -f run_one_set
cat $PBS_NODEFILE | parallel --pipe -N2 run_one_set
(Completely untested).
Upvotes: 0