zx8754
zx8754

Reputation: 56149

Output to the same file sequence

Suppose we have myScript.sh as below:

#!/bin/bash
do something with $1 > bla.txt
do something with bla.txt > temp.txt
...
cat temp.txt >> FinalOuput.txt

Then we run parallel as below:

parallel myScript.sh {} ::: {1..3}

Does it write output in order? Will FinalOutput.txt have results of 1 first, then 2, and then 3.

Note: I am currently outputting to separate files then merging them in required order once parallel is complete, just wondering if I could avoid this step.

Upvotes: 2

Views: 100

Answers (2)

Ole Tange
Ole Tange

Reputation: 33685

The ideal way is to avoid tempfiles all together. That can often be done by using pipes:

parallel 'do something {} | do more | something else' ::: * > FinalOutput

But if that is impossible then use tmpfiles that depends on {#} which is the job sequence number in GNU Parallel:

doer() {
  do something $1 > $2.bla
  do more $2.bla > $2.tmp
  something else $2.tmp
}
export -f doer
parallel doer {} {#} ::: * > FinalOutput

Upvotes: 1

larsks
larsks

Reputation: 311516

The processes are run in parallel. Not only is there no guarantee that they will finish in order, there's not even a guarantee that you can have multiple processes writing to the same file like that and end up with anything useful.

If you are going to be writing to the same file from multiple processes, you should implement some sort of locking to prevent corruption. For example:

while ! mkdir FinalOutput.lock; do
    sleep 1
done

cat temp.txt >> FinalOutput.txt
rmdir FinalOutput.lock

If order matters, you should each script write to a unique file, and then assemble the final output in the correct order after all your parallel jobs have finished.

#!/bin/bash
do something with $1 > bla.txt
do something with bla.txt > temp-$1.txt
...
cat temp.txt >> FinalOuput.txt

And then after parallel has finished:

cat temp-*.txt > FinalOutput.txt

Upvotes: 2

Related Questions