VNA
VNA

Reputation: 625

awk to Count Sum and Unique improve command

Would like to print based on 2nd column ,count of line items, sum of 3rd column and unique values of first column.Having around 100 InputTest files and not sorted .. Am using below 3 commands to achieve the desired output , would like to know the simplest way ...

InputTest*.txt

abc,xx,5,sss
abc,yy,10,sss
def,xx,15,sss
def,yy,20,sss
abc,xx,5,sss
abc,yy,10,sss
def,xx,15,sss
def,yy,20,sss
ghi,zz,10,sss

Step#1:

cat InputTest*.txt | awk -F, '{key=$2;++a[key];b[key]=b[key]+$3} END {for(i in a) print i","a[i]","b[i]}'

Op#1

xx,4,40
yy,4,60
zz,1,10

Step#2

awk -F ',' '{print $1,$2}' InputTest*.txt | sort | uniq >Op_UniqTest2.txt

Op#2

abc xx
abc yy
def xx
def yy
ghi zz

Step#3

awk   '{print $2}' Op_UniqTest2.txt | sort | uniq -c

Op#3

 2 xx
 2 yy
 1 zz

Desired Output:

xx,4,40,2
yy,4,60,2
zz,1,10,1

Looking for suggestions !!!

Upvotes: 0

Views: 888

Answers (2)

Tom Fenech
Tom Fenech

Reputation: 74625

BEGIN { FS = OFS = "," }
{ ++lines[$2]; if (!seen[$2,$1]++) ++diff[$2]; count[$2]+=$3 }
END { for(i in lines) print i, lines[i], count[i], diff[i] }
  • lines tracks the number of occurrences of each value in column 2
  • seen records unique combinations of the second and first column, incrementing diff[$2] whenever a unique combination is found. The ++ after seen[$2,$1] means that the condition will only be true the first time the combination is found, as the value of seen[$2,$1] will be increased to 1 and !seen[$2,$1] will be false.
  • count keeps a total of the third column
$ awk -f avn.awk file
xx,4,40,2
yy,4,60,2
zz,1,10,1

Upvotes: 1

jaypal singh
jaypal singh

Reputation: 77105

Using awk:

$ awk '
BEGIN { FS = OFS = "," }
{ keys[$2]++; sum[$2]+=$3 } !seen[$1,$2]++ { count[$2]++ }
END   { for(key in keys) print key, keys[key], sum[key], count[key] }
' file
xx,4,40,2
yy,4,60,2
zz,1,10,1

Set the input and output field separator to , in BEGIN block. We use arrays keys to identify and count keys. sum array keeps the sum for each keys. count allows us to keep track of unique column1 for each of column2 values.

Upvotes: 1

Related Questions