Reputation: 1278

Run bash shell in parallel and wait

I have 100 files in a directory, and want to process each one with several steps, while step1 is time-consuming. So the pseudocode is like:

for filename in ~/dir/*; do
  run_step1 filename >${filename}.out &
done

for outfile in ~/dir/*.out; do
  run_step2 outfile >${outfile}.result
done

My question is how can I check if step1 is complete for a given input file. I used to use threads.join in C#, but not sure if bash shell has equivalent.

Upvotes: 3

Answers (4)

Jonathan Leffler

Reputation: 753725

It looks like you want:

for filename in ~/dir/*
do
    (
    run_step1 $filename >${filename}.out
    run_step2 ${filename}.out >${filename}.result
    ) &
done
wait

This processes each file in a separate sub-shell, running first step 1 then step 2 on each file, but processing multiple files in parallel.

About the only issue you'll need to worry about is ensuring you don't try running too many processes in parallel. You might want to consider GNU parallel.

You might want to write a trivial script (doit.sh, perhaps):

run_step1 "$1" > "$1.out"
run_step2 "$1.out" > "$1.result"

and then invoke that script from parallel, one file per invocation.

Upvotes: 4

Vality

Reputation: 6607

Try this:

declare -a PROCNUMS
ITERATOR=0
for filename in ~/dir/*; do
    run_step1 filename >${filename}.out &
    PROCNUMS[$ITERATOR]=$!
    let "ITERATOR=ITERATOR+1"
done

ITERATOR=0
for outfile in ~/dir/*.out; do
    wait ${PROCNUMS[$ITERATOR]}
    run_step2 outfile >${outfile}.result
    let "ITERATOR=ITERATOR+1"
done

This will make an array of the created processes then wait for them in order as they need to be completed, not it relies on the fact there is a 1 to 1 relationship between in and out files and the directory is not changed while it is running.

Not for a small performance boost you can now run the second loop asynchronously too if you like assuming each file is independant.

I hope this helps, but if you have any questions please comment.

Upvotes: 3

damgad

Reputation: 1446

After the loop that executes step1 you could write another loop that executes fg command which moves last process moved to background into foreground.

You should be aware that fg could return an error if a process already finished.

After the loop with fgs you are sure that all steps1 have finished.

Upvotes: 0

Dark Falcon

Reputation: 44181

The Bash builtin wait can wait for a specific background job or all background jobs to complete. The simple approach would be to just insert a wait in between your two loops. If you'd like to be more specific, you could save the PID for each background job and wait PID directly before run_step2 inside the second loop.

Upvotes: 2

Run bash shell in parallel and wait

Answers (4)

Related Questions