Reputation: 29

awk and sum rows for large files

I have hundred of files to process. Each file contains millions of rows.

Sample file content:

---------------
12
3
5
---------------
8
0
5
---------------
1
5
56
4
---------------

I need to have the output which looks like below (sum of numbers separated by dashes from previous file):

20
13
66

I used while, if, else in conjunction with awk but if/else dramatically slows down the processing.

Any ideas how to use pure awk to speed up calculations?

Upvotes: 3

Answers (4)

Tasior_Miedziak

Reputation: 29

Thanks to all of you spent your time to help me! Your awk examples are incredibly fast comparing to while/if conditions. Thanks for the link also describing the reasons. It appeared that I created the worst version of the code I could write :-/

My version of code which works as well but it's dramatically slow:

sum=0
while read line
                do
                if [ "$line" = "---------------" ]; then
                        echo $sum
                        sum=0
                else sum=`echo $line $sum | awk '{print $1 + $2}'`
                fi
done < input_file.txt

Thanks again Gurus!

Upvotes: -1

Ed Morton

Reputation: 203995

$ awk '/^-+$/{if (s!="") print s; s=""; next} {s+=$0}' file
20
13
66

Note the setting/comparison of s to "" to handle it differently it if is a summed value of zero vs just initialized to the null string.

Upvotes: 3

glenn jackman

Reputation: 247022

An alternative. I'm curious how it stacks up speed-wise

awk -v RS='\n-+\n' -F'\n' 'NF {s=0; for(i=1; i<=NF; i++) s+=$i; print s}' file ...

Upvotes: 2

karakfa

Reputation: 67507

you don't need if/else blocks,

$ awk 'FNR>1 && /^----/ {print sum; sum=0; next} {sum+=$1}' file{1,2} 
20
13
66
20
13
66

for example for the copy of your input file1 and file2. Perhaps you'll run them one at a time or for multiple inputs a prefix before the sums, for example

$ awk 'FNR==1{block=0} FNR>1 && /^----/ {print FILENAME, ++block, sum; sum=0; next} 
                                        {sum+=$1}' file{1,2} 

file1 1 20
file1 2 13
file1 3 66
file2 1 20
file2 2 13
file2 3 66

Upvotes: 3

awk and sum rows for large files

Answers (4)

Related Questions