Barbot
Barbot

Reputation: 123

Using xargs for parallel Python scripts

I currently have a bash script, script.sh, with two nested loops. The first enumerates possible values for a, and the second enumerates possible values for b, like

#!/bin/sh
for a in {1..10}
do
    for b in {1..10}
    do
        nohup python script.py $a $b &
    done
done

So this spawns off 100 Python processes running script.py, one for each (a,b) pair. However, my machine only has 5 cores, so I want to cap the number of processes at 5 to avoid thrashing/wasteful switching. The goal is that I am always running 5 processes until all 100 processes are done.

xargs seems to be one way to do this, but I don't know how to pass these arguments to xargs. I've checked other similar questions but don't understand the surrounding bash jargon well enough to know what's happening. For example, I tried

seq 1 | xargs -i --max-procs=5 bash script.sh

but this doesn't seem to do anything - script.sh runs as before and still spawns off 100 processes.

I assume I'm misunderstanding how xargs works.

Thanks!

Upvotes: 2

Views: 1979

Answers (3)

Charles Duffy
Charles Duffy

Reputation: 295363

This would actually look more like:

#!/bin/bash
for a in {1..10}; do
  for b in {1..10}; do
    printf '%s\0' "$a" "$b"
  done
done | xargs -0 -x -n 2 -P 5 python script.py

Note that there's no nohup, nor any & -- to track the number of concurrent invocations, xargs needs to be directly executing the Python script, and that process can't exit until it's complete.

The non-standard (but widely available) -0 extension requires input to be in NUL-delimited form (as created with printf '%s\0'); this ensures correct behavior with arguments having spaces, quotes, backslashes, etc.

The likewise non-standard -P 5 sets the maximum number of processes (in a way slightly more portable than --max-procs=5, which is supported on GNU but not modern BSD xargs).

The -n 2 indicates that each instance of the Python script receives only two arguments, thus starting one per pair of inputs.

The -x (used in conjunction with -n 2) indicates that if a single Python instance can't be given two arguments (for instance, if the arguments are so long that both can't fit on a single command line), this should be treated as a failure, rather than invoking a Python instance with only one argument.

Upvotes: 5

Ole Tange
Ole Tange

Reputation: 33685

GNU Parallel is made for exactly these kinds of jobs:

parallel python script.py ::: {1..10} ::: {1..10}

If you need $a and $b placed differently you can use {1} and {2} to refer to the two input sources:

parallel python script.py --option-a {1} --option-b {2} ::: {1..10} ::: {1..10}

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Upvotes: 1

Oleg Andriyanov
Oleg Andriyanov

Reputation: 5269

If you use bash, then the following should work:

#!/bin/bash

for a in {1..10}
do
    for b in {1..10}
    do
        if [ `jobs | wc -l` -lt 6 ]; then # less than 6 background jobs
            nohup python script.py $a $b &
        else
            wait -n   # wait for any background job to terminate
        fi
    done
done

Upvotes: 1

Related Questions