awk to Count Sum and Unique improve command

Question

Would like to print based on 2nd column ,count of line items, sum of 3rd column and unique values of first column.Having around 100 InputTest files and not sorted .. Am using below 3 commands to achieve the desired output , would like to know the simplest way ...

InputTest*.txt

abc,xx,5,sss
abc,yy,10,sss
def,xx,15,sss
def,yy,20,sss
abc,xx,5,sss
abc,yy,10,sss
def,xx,15,sss
def,yy,20,sss
ghi,zz,10,sss

Step#1:

cat InputTest*.txt | awk -F, '{key=$2;++a[key];b[key]=b[key]+$3} END {for(i in a) print i","a[i]","b[i]}'

Op#1

xx,4,40
yy,4,60
zz,1,10

Step#2

awk -F ',' '{print $1,$2}' InputTest*.txt | sort | uniq >Op_UniqTest2.txt

Op#2

abc xx
abc yy
def xx
def yy
ghi zz

Step#3

awk   '{print $2}' Op_UniqTest2.txt | sort | uniq -c

Op#3

 2 xx
 2 yy
 1 zz

Desired Output:

xx,4,40,2
yy,4,60,2
zz,1,10,1

Looking for suggestions !!!

Tom Fenech · Accepted Answer

BEGIN { FS = OFS = "," }
{ ++lines[$2]; if (!seen[$2,$1]++) ++diff[$2]; count[$2]+=$3 }
END { for(i in lines) print i, lines[i], count[i], diff[i] }

lines tracks the number of occurrences of each value in column 2
seen records unique combinations of the second and first column, incrementing diff[$2] whenever a unique combination is found. The ++ after seen[$2,$1] means that the condition will only be true the first time the combination is found, as the value of seen[$2,$1] will be increased to 1 and !seen[$2,$1] will be false.
count keeps a total of the third column

$ awk -f avn.awk file
xx,4,40,2
yy,4,60,2
zz,1,10,1

awk to Count Sum and Unique improve command

Answers (2)

Related Questions