Reputation: 123
I currently have a bash script, script.sh, with two nested loops. The first enumerates possible values for a, and the second enumerates possible values for b, like
#!/bin/sh
for a in {1..10}
do
for b in {1..10}
do
nohup python script.py $a $b &
done
done
So this spawns off 100 Python processes running script.py, one for each (a,b) pair. However, my machine only has 5 cores, so I want to cap the number of processes at 5 to avoid thrashing/wasteful switching. The goal is that I am always running 5 processes until all 100 processes are done.
xargs seems to be one way to do this, but I don't know how to pass these arguments to xargs. I've checked other similar questions but don't understand the surrounding bash jargon well enough to know what's happening. For example, I tried
seq 1 | xargs -i --max-procs=5 bash script.sh
but this doesn't seem to do anything - script.sh runs as before and still spawns off 100 processes.
I assume I'm misunderstanding how xargs works.
Thanks!
Upvotes: 2
Views: 1979
Reputation: 295363
This would actually look more like:
#!/bin/bash
for a in {1..10}; do
for b in {1..10}; do
printf '%s\0' "$a" "$b"
done
done | xargs -0 -x -n 2 -P 5 python script.py
Note that there's no nohup
, nor any &
-- to track the number of concurrent invocations, xargs
needs to be directly executing the Python script, and that process can't exit until it's complete.
The non-standard (but widely available) -0
extension requires input to be in NUL-delimited form (as created with printf '%s\0'
); this ensures correct behavior with arguments having spaces, quotes, backslashes, etc.
The likewise non-standard -P 5
sets the maximum number of processes (in a way slightly more portable than --max-procs=5
, which is supported on GNU but not modern BSD xargs).
The -n 2
indicates that each instance of the Python script receives only two arguments, thus starting one per pair of inputs.
The -x
(used in conjunction with -n 2
) indicates that if a single Python instance can't be given two arguments (for instance, if the arguments are so long that both can't fit on a single command line), this should be treated as a failure, rather than invoking a Python instance with only one argument.
Upvotes: 5
Reputation: 33685
GNU Parallel is made for exactly these kinds of jobs:
parallel python script.py ::: {1..10} ::: {1..10}
If you need $a and $b placed differently you can use {1} and {2} to refer to the two input sources:
parallel python script.py --option-a {1} --option-b {2} ::: {1..10} ::: {1..10}
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for
loop.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel
Upvotes: 1
Reputation: 5269
If you use bash, then the following should work:
#!/bin/bash
for a in {1..10}
do
for b in {1..10}
do
if [ `jobs | wc -l` -lt 6 ]; then # less than 6 background jobs
nohup python script.py $a $b &
else
wait -n # wait for any background job to terminate
fi
done
done
Upvotes: 1