SirishaS
SirishaS

Reputation: 23

waiting on jobs in bash, allowing for a limited parallel jobs at one time, and then for all to finish to continue with the rest of the pipeline

I am running GNU bash, version 3.2.39(1)-release (x86_64-pc-linux-gnu). I have a specific question pertaining to waiting on jobs run in sub-shells, based on the max number of parallel processes I want to allow, and then wait for the remaining sub-shell jobs to finish before the next step is executed in the pipeline (if I am making proper sense here)..

Essentially,my pseudo code looks like this:

    MAX_PROCS=3
    for (( k = 0 ; $k < $kmerlen ; k += 1 ))
    do
    (
     ### Running a perl script here for each k (this script is a memory hog)...
    )&
    while [ $(ps -e | grep 'perlScriptAbove' | grep -v grep | wc -l) -gt ${MAX_PROCS} ] ; 
    do
       wait
    done

    done

    ###wait <- works fine without this wait, but I need all kmerlen jobs to finish first to proceed to the next part of the pipeline
    ## Run the rest of the pipeline...

The first wait statement in the while loop works fine spawning 3 jobs, but when I use the next wait statement, that property is lost, and the number of sub-shells spawned are equal to my kmerlen

My apologies if this has been answered before, but I didn't seem to find one.

Thanks a lot.

Upvotes: 2

Views: 2126

Answers (3)

Ole Tange
Ole Tange

Reputation: 33748

GNU Parallel is made for this kind of tasks. Gzip all txt-files in parallel and cat them together into a big .gz file:

parallel gzip -c ::: *.txt > out.gz

Watch the intro videos to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ

Upvotes: 3

Drakosha
Drakosha

Reputation: 12165

Simply calling wait should wait for all the background jobs executed by the shell, it looks like that's exactly what u need.

I.e. your code should be something like:

while (not all jobs spawned) # i.e. you want to do 40 jobs
  spawn as much jobs as you need in parallel (i.e. 4 jobs)
  wait

Upvotes: 5

Kyle Burton
Kyle Burton

Reputation: 27568

Not exactly bash, but does do what you're asking: parallel-jobs is a perl program I made to do exactly this. You specify a file of "jobs", where each line is a job (a bash one liner), and a maximum number of jobs to execute in parallel and it will keep that many running until all the jobs have been completed.

It works with the standard install of perl (no additional modules required). You may also want to look into gnu parallel, which is very similar.

Upvotes: 2

Related Questions