Matt
Matt

Reputation: 826

How to parallelize a for loop that includes screen while limiting the number of processes?

I have a parallelizable for loop in bash and would like to limit the number of jobs that are run in parallel. The script looks like this:

#!/usr/bin/env bash

num_cores=25
num_jobs=100

for ((i = 0; i < num_jobs; i++)); do
    while read -r -a curr_jobs < <(jobs -p -r) \
        && ((${#curr_jobs[@]} >= num_cores)); do
        wait -n
    done
    NAME=job_$i
    screen -S $NAME -d -m bash -c "my bash command"
done

The script is based on the answer to a similar question on stackoverflow. The difference, though, is that Python is called in that answer while I call screen in my loop. It looks like this solution is not compatible with screen for some reason that I am not aware of.

How can I modify my script to limit the number of parallel screen sessions?

Are there easier/better solutions to this problem?

Upvotes: 2

Views: 410

Answers (2)

iuooip
iuooip

Reputation: 91

Some time ago on AIX, with only base packages. I've done this like:

#!/bin/bash 
FILE=$1
tmpVAR=0
while read line; do 
    ScriptTakingArgFromFile.sh $line & PID1=$!
    tmpVAR=$[$tmpVAR+1]
    if [ "$tmpVAR" -eq 50 ] 
        then tmpVAR=0
        # echo "STOP 50"
        wait $PID1
    fi
done < $FILE
wait $PID1

This will execute 51 processes and stop. Then when process number 51 will end will fire another 51.

If you want to have exactly some number of processes all the time you can always write pids to array and stop when array has desired length. Write check for those PIDs and remove them from array when process has ended, then array will shrink and another process will be fired.

Hope that helps.

Upvotes: 0

Ole Tange
Ole Tange

Reputation: 33748

Use GNU Parallel and tmux:

seq 100 | parallel -j25 -N0 --tmux my bash command

Example: Run one sleep for each CPU-core:

seq 100 | parallel --tmux "echo Running {} Job sequence {#}; sleep {}"

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

For security reasons you should install GNU Parallel with your package manager, but if GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Upvotes: 1

Related Questions