user3133271
user3133271

Reputation: 25

awk sum column based on repeated occurence in another column & print each value that's to be summed

I have a similar input

A 3
C 1
A 4
B 2
B 2

output should be 
A total=7 (3+4)
C total=1 (1)
B total=4 (2+2)

Can anyone pls tell me how to do this in awk? The input is part of an awk line output, hence the request for a solution in awk. Thanks!

Upvotes: 2

Views: 1195

Answers (2)

Eran Ben-Natan
Eran Ben-Natan

Reputation: 2615

I would like to suggest another way:

sort -k 1,1 your_file |
cat - <(echo "") |
gawk '
  $1==key {
    line=line " + " $2; sum+=$2
  }
  $1 != key {
    if (NR>1){print key " total=" sum " (" line ")"}
    key=$1
    line=$2
    sum=$2
  }'

What are the differences?

1) This awk does not use arrays. This is significant when working on large files.

2) This is more the AWK way, while previous answer is more like programming language way.

3) If original order is matter, you can do something like this:

gawk '{print $0 " " NR}' your_file |
sort -k 1,1 | cat - <(echo "") |
gawk '$1==key {line=line " + " $2; sum+=$2} $1 != key {if (NR>1){print nr " " key " total=" sum " (" line ")"}; key=$1; line=$2; sum=$2; nr=$NF}' |
sort -k 1,1n |
cut -d \  -f 2-

Upvotes: 1

H&#229;kon H&#230;gland
H&#229;kon H&#230;gland

Reputation: 40758

You can try the following code:

awk '
{
    a[$1]+=$2
    b[$1]=(b[$1]=="")?$2:(b[$1]"+"$2)
}
END {
    for (i in a)
        print i" total="a[i]" ("b[i]")"
}' file

with output:

A total=7 (3+4)
B total=4 (2+2)
C total=1 (1)

Upvotes: 0

Related Questions