Loonuh
Loonuh

Reputation: 141

How to utilize parallel nodes through Shell Script?

I am trying to utilize parallel nodes to run numerical simulations. I have Nodes #0 though 12 and I wish to utilize them each individually to run a separate part of the simulation. Essentially, I need to evaluate f(x) for x=1 through 4 on one node, then f(x) for x=5 through 9 on the next node, and then f(x) for x = 10 through 14 one the next one, and then so on from there. Initially, I tried using a loop like:

n=0
while [ $n -le 12 ]
do
   ssh compute-0-$n
   #evaluate the f(x) for the x values that I want
   exit
   n=$(($n+1))
done

But this did not work because whenever I used the ssh compute-0-$n command to jump to a node the connection to the original shell script seemed to cease, when I would exit the node, the shell script seemed to continue along its merry way... I suppose there is a better way to accomplish this, but I am relatively new to this, can anyone help?

Upvotes: 2

Views: 1536

Answers (3)

Ole Tange
Ole Tange

Reputation: 33685

GNU Parallel is made for exactly this kind of tasks.

evaluate_f() {
  x="$1"
  # do some crazy computation
}
seq 48 | env_parallel --env evaluate_f -Snode{1..12} evaluate_f {}

If the machines are not really called node1 .. node12, then is becomes a bit longer:

seq 48 | env_parallel --env evaluate_f -Snode1,nodeb,nodeIII,node0100,node0x5,node6,nodeg,nodeVIII,node01001,node0xa,node11,nodel evaluate_f {}

If you have the nodes in a file:

seq 48 | env_parallel --env evaluate_f --slf my_nodefile evaluate_f {}

What this does is to copy the function evaluate_f to the remote servers and run it there with one argument from seq 48. By default it will run one job per cpu-core in the servers. This makes sense if your computation is not multithreaded and does not have a lot of disk I/O. This can be changed with --jobs.

env_parallel was introduced in version 20160322, so make sure your version is newer than that.

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

You should install GNU Parallel with your package manager, but if GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Upvotes: 1

BobS
BobS

Reputation: 2648

The first thing to understand is that when you run ssh (without the &), ssh itself runs until completion. It opens up a new shell on the remote host, and reads commands -- but not the commands from the script that launched it. The ssh session has no knowledge of the script that launched it; it's waiting for commands from stdin.

You need to do three things:

  1. Take all the code from inside your loop after the ssh line, and put it into its own script (call it docompute.sh).
  2. Put that script on each compute node, in a directory in the $PATH variable of the executing user, and
  3. in the parent script, replace everything in the loop with ssh compute-0-$n docompute.sh &. The & will get you the parallelism you want, by running the ssh process in the background.

See running same script over many machines for discussion of something quite similar. The use of & to run the command in the background is key there.

Upvotes: 1

Houcheng
Houcheng

Reputation: 2894

If in ubuntu, you could use odp program.

this program utilize the parallel ssh to run command simultaneously. user only needed to write their data center configuration and scripts into a config file, then use this program to parallel execute.

here is url: http://sourceforge.net/projects/odp/

Upvotes: 0

Related Questions