roblanf
roblanf

Reputation: 1803

Use GNU parallel to parallelise a bash for loop

I have a for loop which runs a Python script ~100 times on 100 different input folders. The python script is most efficient on 2 cores, and I have 50 cores available. So I'd like to use GNU parallel to run the script on 25 folders at a time.

Here's my for loop (works fine, but is sequential of course), the python script takes a bunch of input variables including the -p 2 which runs it on two cores:

for folder in $(find /home/rob/PartitionFinder/ -maxdepth 2 -type d); do
        python script.py --raxml --quick --no-ml-tree $folder --force -p 2
done

and here's my attempt to parallelise it, which does not work:

folders=$(find /home/rob/PartitionFinder/ -maxdepth 2 -type d)

echo $folders | parallel -P 25 python script.py --raxml --quick --no-ml-tree {} --force -p 2

The issue I'm hitting (perhaps it's just the first of many though) is that my folders variable is not a list, so it's really just passing a long string of 100 folders as the {} to the script.

All hints gratefully received.

Upvotes: 3

Views: 1452

Answers (3)

Till
Till

Reputation: 4523

You can create a Makefile like this:

#!/usr/bin/make -f

FOLDERS=$(shell find /home/rob/PartitionFinder/ -maxdepth 2 -type d)

all: ${FOLDERS}

# To execute the find before the all
find_folders:
    @ echo $(FOLDERS) > /dev/null

${FOLDERS}: find_folders
    @ python script.py --raxml --quick --no-ml-tree $@ --force -p 2

and then run make -j 25

Be careful: use tabs to indent in your file

Also, files with spaces in the name won't work.

Upvotes: 0

user4815162342
user4815162342

Reputation: 154886

Replace echo $folders | parallel ... with echo "$folders" | parallel ....

Without the double quotes, the shell parses spaces in $folders and passes them as separate arguments to echo, which causes them to be printed on one line. parallel provides each line as argument to the job.

To avoid such quoting issues altogether, it is always a good idea to pipe find to parallel directly, and use the null character as the delimiter:

find ... -print0 | parallel -0 ...

This will work even when encountering file names that contain multiple spaces or a newline character.

Upvotes: 6

olisch
olisch

Reputation: 990

you can pipe find directly to parallel:

 find /home/rob/PartitionFinder/ -maxdepth 2 -type d | parallel -P 25 python script.py --raxml --quick --no-ml-tree {} --force -p 2

If you want to keep the string in $folder, you can pipe the echo to xargs.

echo $folders | xargs -n 1 | parallel -P 25 python script.py --raxml --quick --no-ml-tree {} --force -p 2

Upvotes: 4

Related Questions