yasir
yasir

Reputation: 123

HPC: Multiple independent serial jobs submission between nodes

I have 4 directories(name:1,2,3,4). Each one has an executable file of C code with name submit. Using #PBS -l select=2:ncpus=2, gave me 4 workers(2 on node-1 and 2 on node-2).

Task: I need to run each 4 files on 4 different folders independently.

#PBS -l select=2:ncpus=2
./1/submit&
./2/submit&
./3/submit&
./4/submit&

Above forking method only chooses node-1 and forks all 4 jobs between 2 workers of node-1 and never goes to node-2.

#PBS -l select=2:ncpus=2

mpirun -np 1 -machinefile $PBS_NODEFILE ./1/submit&
mpirun -np 1 -machinefile $PBS_NODEFILE ./2/submit&
mpirun -np 1 -machinefile $PBS_NODEFILE ./3/submit&
mpirun -np 1 -machinefile $PBS_NODEFILE ./4/submit&

I tried using mpirun, but it still forks only between node-1 workers. Kindly suggest if there is any method to divide jobs between nodes.

Update's on the question after Ole Tange's answer

(1) Directory structure and it's contents are as follows:

ParentDirectory has PBS file "sub.sh" and sub-directories 1,2,3,4. Each sub directory has submit file which is an executable file compiled with icc compiler. submit file is a molecular dynamics executable code which generates files into the folder from where job is submitted.

(2) Running jobs on 1 node , 4 cores ==> 4 threads in total;

sub.sh has the contents,

#PBS -l select=1:ncpus=4
cd 1;./submit&
cd ../2;./submit&
cd ../3;./submit&
cd ../4;./submit&

sub.sh is submitted from the parent directory then it goes inside individual directories and creates threads for each folder. And hence the resulting files are generated inside each 1,2,3,4 directory without any interference from the other directories or threads. The resulting video looks like this which is correct

(3) Running jobs using gnu-parallel on 2 node , 2 cores==> 4 threads in total:

sub.sh has the contents,

#PBS -l select=2:ncpus=2
seq 4 | parallel --wd . -S 2/"$node1" -S 2/"$node2" ./exx

exx has the contents

cd 1;./submit&
cd ../2;./submit&
cd ../3;./submit&
cd ../4;./submit&

sub.sh is submitted from the parent directory. After I submitted sub.sh, I have seen that jobs are running on each folders 1,2,3,4 and generating files inside the directories, and the speed is comparable to serial code, which means that at least all 4 workers are working. But when I make the video of the results of 1 folder it looks strange, as you can see that the blue swimmer oscillates a lot, which I might be because of the race around condition , video

Surely something strange is going on in between the threads. I don't know.

Upvotes: 2

Views: 369

Answers (1)

Ole Tange
Ole Tange

Reputation: 33685

Something like:

seq 4 | parallel --wd . -S 2/node1 -S 2/node2 ./{}/submit

Upvotes: 1

Related Questions