Reputation: 1061
I have a file
a x 0 3
a x 0 1
b,c x 4 4
dd x 3 5
dd x 2 5
e,e,t x 5 7
a b 1 9
cc b 2 1
cc b 1 1
e,e,t b 1 2
e,e,t b 1 2
e,e,t b 1 2
for each element in $1$2
, I want print the sum $3, $4
and the number of occurrences/lenght/counts
So that I have
a x 0 4 0 2
b,c x 4 4 1 1
dd x 5 10 2 2
e,e,t x 5 7 1 1
a b 1 9 1 1
cc b 3 2 2 2
e,e,t b 3 6 3 3
I am using
awk -F"\t" '{for(n=2;n<=NF; ++n) a[$1 OFS $2][n]+=$n}
END {for(i in a) {
printf "%s", i
for (n=3; n<=4; ++n) printf "\t%s", a[i][n], a[i][n]++
printf "\n" }}' file
but it's only printing the sums, not the counts
The actual file has many columns: the keys are $4$6$7$8
and the numerical columns are $9-$13
Upvotes: 1
Views: 82
Reputation: 785286
You may use this awk
:
cat sum.awk
BEGIN {
FS = OFS = "\t" # set input/output FS to tab
}
{
k = $1 OFS $2 # create key using $1 tab $2
if (!(k in map3)) # if k is not in map3 save it in an ordered array
ord[++n] = k
map3[k] += $3 # sum of $3 in array map3 using key as k
$3 > 0 && ++fq3[k] # frequency of $3 if it is > 0
map4[k] += $4 # sum of $4 in array map4 using key as k
$4 > 0 && ++fq4[k] # frequency of $4 if it is > 0
}
END {
for(i=1; i<=n; ++i) # print everything by looping through ord array
print ord[i], map3[ord[i]], map4[ord[i]], fq3[ord[i]]+0, fq4[ord[i]]+0
}
Then use it as:
awk -f sum.awk file
a x 0 4 0 2
b,c x 4 4 1 1
dd x 5 10 2 2
e,e,t x 5 7 1 1
a b 1 9 1 1
cc b 3 2 2 2
e,e,t b 3 6 3 3
Upvotes: 1