Reputation: 1278
I have 100 files in a directory, and want to process each one with several steps, while step1 is time-consuming. So the pseudocode is like:
for filename in ~/dir/*; do
run_step1 filename >${filename}.out &
done
for outfile in ~/dir/*.out; do
run_step2 outfile >${outfile}.result
done
My question is how can I check if step1 is complete for a given input file. I used to use threads.join in C#, but not sure if bash shell has equivalent.
Upvotes: 3
Views: 2517
Reputation: 753725
It looks like you want:
for filename in ~/dir/*
do
(
run_step1 $filename >${filename}.out
run_step2 ${filename}.out >${filename}.result
) &
done
wait
This processes each file in a separate sub-shell, running first step 1 then step 2 on each file, but processing multiple files in parallel.
About the only issue you'll need to worry about is ensuring you don't try running too many processes in parallel. You might want to consider GNU parallel
.
You might want to write a trivial script (doit.sh
, perhaps):
run_step1 "$1" > "$1.out"
run_step2 "$1.out" > "$1.result"
and then invoke that script from parallel
, one file per invocation.
Upvotes: 4
Reputation: 6607
Try this:
declare -a PROCNUMS
ITERATOR=0
for filename in ~/dir/*; do
run_step1 filename >${filename}.out &
PROCNUMS[$ITERATOR]=$!
let "ITERATOR=ITERATOR+1"
done
ITERATOR=0
for outfile in ~/dir/*.out; do
wait ${PROCNUMS[$ITERATOR]}
run_step2 outfile >${outfile}.result
let "ITERATOR=ITERATOR+1"
done
This will make an array of the created processes then wait for them in order as they need to be completed, not it relies on the fact there is a 1 to 1 relationship between in and out files and the directory is not changed while it is running.
Not for a small performance boost you can now run the second loop asynchronously too if you like assuming each file is independant.
I hope this helps, but if you have any questions please comment.
Upvotes: 3
Reputation: 1446
After the loop that executes step1 you could write another loop that executes fg
command which moves last process moved to background into foreground.
You should be aware that fg
could return an error if a process already finished.
After the loop with fg
s you are sure that all steps1 have finished.
Upvotes: 0
Reputation: 44181
The Bash builtin wait
can wait for a specific background job or all background jobs to complete. The simple approach would be to just insert a wait
in between your two loops. If you'd like to be more specific, you could save the PID for each background job and wait PID
directly before run_step2
inside the second loop.
Upvotes: 2