Reputation: 1078
I have a simple bash script which takes as input a list of directory names in a text file. It traverses these directories one by one, copies the output of pwd
to a file, and moves this file to a results directory. I can parallelize this script on my 4 core machine using Gnuparallel easily. The bash script (myScript.sh
) is given below:
#!/bin/bash
par_func (){
name=$1
cd /home/zahaib/parentFolder/$name
pwd > $name.txt
mv $name.txt /home/zahaib/result/
cd /home/zahaib/parentFolder
}
export -f par_func
parallel -a /home/zahaib/folderList.txt -j 10 par_func
Now I want to parallelize the same script on a Cluster, all the worker nodes have mounted the home directory of the Master node so I can see the output of ls /home/zahaib/
on all worker nodes.
I tried using the --env
to export the par_func
. I also have a list of worker nodes in a workerList.txt
file. My initial idea was to invoke parallel
by changing the last line in my script above with the following:
parallel -vv --env par_func --slf /home/zahaib/workerList.txt -a /home/zahaib/folderList.txt -j 10 par_func
However, this does not seem to work and the shell on Master node just hangs after I do ./myScript.sh
. What am I missing here?
The contents of my folderList.txt are as follows:
docs
dnload
driver
pics
music
.
.
and the contents of my workerList.txt are as follows:
2//usr/bin/ssh zahaib@node-1
2//usr/bin/ssh zahaib@node-2
2//usr/bin/ssh zahaib@node-3
Upvotes: 1
Views: 198
Reputation: 33685
From your description you are doing the right thing, so you may have hit a bug.
Try minimizing workerList.txt and folderList.txt and then run:
parallel -D ...
(And also checkout the option --result which might be useful to you).
Upvotes: 1