JohnDro
JohnDro

Reputation: 90

Gawk distinct and sum column

I am very new to linux and the usage of awk and couldn't find an answer for my following question:

I want to use awk and my file is structured like that:

Date ID Size
2016-11-09 688 47
2016-11-09 688 56
2016-11-09 31640 55

Now I want to sum up the size for each line that has the Date and ID and export it to a .csv file. The file should look like that:

Date,ID,Size
2016-11-09,688,103
2016-11-09,31640 55

I really need your help, because I could not figure out how to do it on my own, thank you.

Upvotes: 0

Views: 101

Answers (1)

Ed Morton
Ed Morton

Reputation: 204228

If your input is really sorted by date and ID as in your sample then you should use this:

$ cat tst.awk
BEGIN { OFS="," }
NR==1 { $1=$1; print; next }
{ curr = $1 OFS $2 }
(curr != prev) && (NR > 2) { print prev, sum; sum=0 }
{ prev = curr; sum += $3 }
END { print prev, sum }

$ awk -f tst.awk file
Date,ID,Size
2016-11-09,688,103
2016-11-09,31640,55

rather than saving the whole file in memory. Note that this approach will also produce output in the same order as the input whereas any for .. in .. loop in an END section will print the output in random (hash) order.

Upvotes: 2

Related Questions