Madza Farias-Virgens
Madza Farias-Virgens

Reputation: 1061

For each unique occurrence in field, print sum corresponding numerical field and number of occurrences/counts

I have a file

a       x       0   3
a       x       0   1
b,c     x       4   4
dd      x       3   5
dd      x       2   5
e,e,t   x       5   7
a       b       1   9
cc      b       2   1
cc      b       1   1
e,e,t   b       1   2
e,e,t   b       1   2
e,e,t   b       1   2

for each element in $1$2, I want print the sum $3, $4 and the number of occurrences/lenght/counts

So that I have

a       x       0   4   0   2
b,c     x       4   4   1   1
dd      x       5   10  2   2
e,e,t   x       5   7   1   1
a       b       1   9   1   1
cc      b       3   2   2   2   
e,e,t   b       3   6   3   3  

I am using

awk -F"\t" '{for(n=2;n<=NF; ++n) a[$1 OFS $2][n]+=$n}
        END {for(i in a) {
                printf "%s", i
                for (n=3; n<=4; ++n) printf "\t%s", a[i][n], a[i][n]++
                printf "\n" }}' file

but it's only printing the sums, not the counts

The actual file has many columns: the keys are $4$6$7$8 and the numerical columns are $9-$13

Upvotes: 1

Views: 82

Answers (1)

anubhava
anubhava

Reputation: 785286

You may use this awk:

cat sum.awk

BEGIN {
   FS = OFS = "\t"     # set input/output FS to tab 
}
{
   k = $1 OFS $2       # create key using $1 tab $2
   if (!(k in map3))   # if k is not in map3 save it in an ordered array
      ord[++n] = k

   map3[k] += $3       # sum of $3 in array map3 using key as k
   $3 > 0 && ++fq3[k]  # frequency of $3 if it is > 0
   map4[k] += $4       # sum of $4 in array map4 using key as k
   $4 > 0 && ++fq4[k]  # frequency of $4 if it is > 0
}
END {
   for(i=1; i<=n; ++i) # print everything by looping through ord array
      print ord[i], map3[ord[i]], map4[ord[i]], fq3[ord[i]]+0, fq4[ord[i]]+0
}

Then use it as:

awk -f sum.awk file

a       x   0   4   0   2
b,c     x   4   4   1   1
dd      x   5   10  2   2
e,e,t   x   5   7   1   1
a       b   1   9   1   1
cc      b   3   2   2   2
e,e,t   b   3   6   3   3

Upvotes: 1

Related Questions