user2300940
user2300940

Reputation: 2385

parallel execution of script on cluster for multiple inputs

I have a script pytonscript.py that I want to run on 500 samples. I have 50 CPUs available and want to run the script in parallel using 1 CPU for each sample, so that 50 samples are constantly running with 1 CPU each. Any ideas how to set this up without typing 500 lines with the different inputs? I know how to make a loop for each sample, but not how to make 50 samples running in parallel. I guess GNU parallel is a way?

Input samples in folder samples:

sample1 sample2 sample2 ... sample500

pytonscript.py -i samples/sample1.sam.bz2 -o output_folder

Upvotes: 0

Views: 163

Answers (1)

dan
dan

Reputation: 5221

What about GNU xargs?

printf '%s\0' samples/sample*.sam.bz |
xargs -0L1 -P 50 pytonscript.py -o output_dir -i

This launches a new instance of the python script for each file, concurrently, maintaining a maximum of 50 at once.

If the wildcard glob expansion isn't specific enough, you can use bash's extglob: shopt -s exglob; # samples/sample+([0-9]).sam.bz

Upvotes: 1

Related Questions