awk sum column based on repeated occurence in another column & print each value that's to be summed

Question

I have a similar input

A 3
C 1
A 4
B 2
B 2

output should be 
A total=7 (3+4)
C total=1 (1)
B total=4 (2+2)

Can anyone pls tell me how to do this in awk? The input is part of an awk line output, hence the request for a solution in awk. Thanks!

Eran Ben-Natan · Accepted Answer

I would like to suggest another way:

sort -k 1,1 your_file |
cat - <(echo "") |
gawk '
  $1==key {
    line=line " + " $2; sum+=$2
  }
  $1 != key {
    if (NR>1){print key " total=" sum " (" line ")"}
    key=$1
    line=$2
    sum=$2
  }'

What are the differences?

1) This awk does not use arrays. This is significant when working on large files.

2) This is more the AWK way, while previous answer is more like programming language way.

3) If original order is matter, you can do something like this:

gawk '{print $0 " " NR}' your_file |
sort -k 1,1 | cat - <(echo "") |
gawk '$1==key {line=line " + " $2; sum+=$2} $1 != key {if (NR>1){print nr " " key " total=" sum " (" line ")"}; key=$1; line=$2; sum=$2; nr=$NF}' |
sort -k 1,1n |
cut -d \  -f 2-

awk sum column based on repeated occurence in another column & print each value that's to be summed

Answers (2)

Related Questions

awk sum column based on repeated occurence in another column &amp; print each value that&#39;s to be summed

Answers (2)

Related Questions

awk sum column based on repeated occurence in another column & print each value that's to be summed