Zahaib Akhtar
Zahaib Akhtar

Reputation: 1078

GnuParallel: Parallelizing a script over a cluster, script writes files to the Master node

I have a simple bash script which takes as input a list of directory names in a text file. It traverses these directories one by one, copies the output of pwd to a file, and moves this file to a results directory. I can parallelize this script on my 4 core machine using Gnuparallel easily. The bash script (myScript.sh) is given below:

#!/bin/bash

par_func (){
    name=$1
    cd /home/zahaib/parentFolder/$name
    pwd > $name.txt
    mv $name.txt /home/zahaib/result/
    cd /home/zahaib/parentFolder
    }

export -f par_func
parallel -a /home/zahaib/folderList.txt -j 10 par_func

Now I want to parallelize the same script on a Cluster, all the worker nodes have mounted the home directory of the Master node so I can see the output of ls /home/zahaib/ on all worker nodes.

I tried using the --env to export the par_func. I also have a list of worker nodes in a workerList.txt file. My initial idea was to invoke parallel by changing the last line in my script above with the following:

parallel -vv --env par_func --slf /home/zahaib/workerList.txt -a /home/zahaib/folderList.txt -j 10 par_func 

However, this does not seem to work and the shell on Master node just hangs after I do ./myScript.sh. What am I missing here?

The contents of my folderList.txt are as follows:

docs
dnload
driver
pics
music
.
.

and the contents of my workerList.txt are as follows:

2//usr/bin/ssh zahaib@node-1
2//usr/bin/ssh zahaib@node-2
2//usr/bin/ssh zahaib@node-3

Upvotes: 1

Views: 198

Answers (1)

Ole Tange
Ole Tange

Reputation: 33685

From your description you are doing the right thing, so you may have hit a bug.

Try minimizing workerList.txt and folderList.txt and then run:

parallel -D ...

(And also checkout the option --result which might be useful to you).

Upvotes: 1

Related Questions