Reputation: 1803
I have a for loop which runs a Python script ~100 times on 100 different input folders. The python script is most efficient on 2 cores, and I have 50 cores available. So I'd like to use GNU parallel to run the script on 25 folders at a time.
Here's my for loop (works fine, but is sequential of course), the python script takes a bunch of input variables including the -p 2
which runs it on two cores:
for folder in $(find /home/rob/PartitionFinder/ -maxdepth 2 -type d); do
python script.py --raxml --quick --no-ml-tree $folder --force -p 2
done
and here's my attempt to parallelise it, which does not work:
folders=$(find /home/rob/PartitionFinder/ -maxdepth 2 -type d)
echo $folders | parallel -P 25 python script.py --raxml --quick --no-ml-tree {} --force -p 2
The issue I'm hitting (perhaps it's just the first of many though) is that my folders
variable is not a list, so it's really just passing a long string of 100 folders as the {}
to the script.
All hints gratefully received.
Upvotes: 3
Views: 1452
Reputation: 4523
You can create a Makefile
like this:
#!/usr/bin/make -f
FOLDERS=$(shell find /home/rob/PartitionFinder/ -maxdepth 2 -type d)
all: ${FOLDERS}
# To execute the find before the all
find_folders:
@ echo $(FOLDERS) > /dev/null
${FOLDERS}: find_folders
@ python script.py --raxml --quick --no-ml-tree $@ --force -p 2
and then run make -j 25
Be careful: use tabs to indent in your file
Also, files with spaces in the name won't work.
Upvotes: 0
Reputation: 154886
Replace echo $folders | parallel ...
with echo "$folders" | parallel ...
.
Without the double quotes, the shell parses spaces in $folders
and passes them as separate arguments to echo
, which causes them to be printed on one line. parallel
provides each line as argument to the job.
To avoid such quoting issues altogether, it is always a good idea to pipe find
to parallel
directly, and use the null character as the delimiter:
find ... -print0 | parallel -0 ...
This will work even when encountering file names that contain multiple spaces or a newline character.
Upvotes: 6
Reputation: 990
you can pipe find directly to parallel:
find /home/rob/PartitionFinder/ -maxdepth 2 -type d | parallel -P 25 python script.py --raxml --quick --no-ml-tree {} --force -p 2
If you want to keep the string in $folder
, you can pipe the echo to xargs.
echo $folders | xargs -n 1 | parallel -P 25 python script.py --raxml --quick --no-ml-tree {} --force -p 2
Upvotes: 4