Reputation: 29
I have hundred of files to process. Each file contains millions of rows.
Sample file content:
---------------
12
3
5
---------------
8
0
5
---------------
1
5
56
4
---------------
I need to have the output which looks like below (sum of numbers separated by dashes from previous file):
20
13
66
I used while
, if
, else
in conjunction with awk
but if
/else
dramatically slows down the processing.
Any ideas how to use pure awk
to speed up calculations?
Upvotes: 3
Views: 118
Reputation: 29
Thanks to all of you spent your time to help me! Your awk examples are incredibly fast comparing to while/if conditions. Thanks for the link also describing the reasons. It appeared that I created the worst version of the code I could write :-/
My version of code which works as well but it's dramatically slow:
sum=0
while read line
do
if [ "$line" = "---------------" ]; then
echo $sum
sum=0
else sum=`echo $line $sum | awk '{print $1 + $2}'`
fi
done < input_file.txt
Thanks again Gurus!
Upvotes: -1
Reputation: 203995
$ awk '/^-+$/{if (s!="") print s; s=""; next} {s+=$0}' file
20
13
66
Note the setting/comparison of s to ""
to handle it differently it if is a summed value of zero vs just initialized to the null string.
Upvotes: 3
Reputation: 247022
An alternative. I'm curious how it stacks up speed-wise
awk -v RS='\n-+\n' -F'\n' 'NF {s=0; for(i=1; i<=NF; i++) s+=$i; print s}' file ...
Upvotes: 2
Reputation: 67507
you don't need if/else blocks,
$ awk 'FNR>1 && /^----/ {print sum; sum=0; next} {sum+=$1}' file{1,2}
20
13
66
20
13
66
for example for the copy of your input file1 and file2. Perhaps you'll run them one at a time or for multiple inputs a prefix before the sums, for example
$ awk 'FNR==1{block=0} FNR>1 && /^----/ {print FILENAME, ++block, sum; sum=0; next}
{sum+=$1}' file{1,2}
file1 1 20
file1 2 13
file1 3 66
file2 1 20
file2 2 13
file2 3 66
Upvotes: 3