Reputation: 570
I have a directory with files that I want to process one by one and for which each output looks like this:
==== S=721 I=47 D=654 N=2964 WER=47.976% (1422)
Then I want to calculate the average percentage (column 6) by piping the output to AWK. I would prefer to do this all in one script and wrote the following code:
for f in $dir; do
echo -ne "$f "
process $f
done | awk '{print $7}' | awk -F "=" '{sum+=$2}END{print sum/NR}'
When I run this several times, I often get different results although in my view nothing really changes. The result is almost always incorrect though.
However, if I only put the for
loop in the script and pipe to AWK on the command line, the result is always the same and correct.
What is the difference and how can I change my script to achieve the correct result?
Upvotes: 0
Views: 326
Reputation: 11
# To get the percentage of all files
Percs=$(sed -r 's/.*WER=([[:digit:].]*).*/\1/' *)
# The divisor
Lines=$(wc -l <<< "$Percs")
# To change new lines into spaces
P=$(echo $Percs)
# Execute one time without the bc. It's easier to understand
echo "scale=3; (${P// /+})/$Lines" | bc
Upvotes: 0
Reputation: 265
Guessing a little about what you're trying to do, and without more details it's hard to say what exactly is going wrong.
for f in $dir; do
unset TEMPVAR
echo -ne "$f "
TEMPVAR=$(process $f | awk '{print $7}')
ARRAY+=($TEMPVAR)
done
I would append all your values to an array inside your for
loop. Now all your percentages are in $ARRAY
. It should be easy to calculate the average value, using whatever tool you like.
This will also help you troubleshoot. If you get too few elements in the array ${#ARRAY[@]}
then you will know where your loop is terminating early.
Upvotes: 1