p4guru
p4guru

Reputation: 1450

Bash shell using AWK to add up several respective columns / fields?

Wondering if someone could point me in right direction using bash shell scripting and awk to add up several column/fields and print out a summary.

I want to take stats outputted in the following format

REQ_RATE REQ_PROCESSING: 0, REQ_PER_SEC: 1.0, TOT_REQS: 2,
REQ_RATE CACHE_HITS_PER_SEC: 0.5, TOTAL_CACHE_HITS: 10
REQ_RATE REQ_PROCESSING: 0, REQ_PER_SEC: 2.0, TOT_REQS: 0,
REQ_RATE CACHE_HITS_PER_SEC: 0.5, TOTAL_CACHE_HITS: 20
REQ_RATE REQ_PROCESSING: 0, REQ_PER_SEC: 3.0, TOT_REQS: 2,
REQ_RATE CACHE_HITS_PER_SEC: 0.5, TOTAL_CACHE_HITS: 30
REQ_RATE REQ_PROCESSING: 0, REQ_PER_SEC: 4.0, TOT_REQS: 1,
REQ_RATE CACHE_HITS_PER_SEC: 0.5, TOTAL_CACHE_HITS: 40
REQ_RATE REQ_PROCESSING: 0, REQ_PER_SEC: 5.0, TOT_REQS: 0,
REQ_RATE CACHE_HITS_PER_SEC: 0.5, TOTAL_CACHE_HITS: 50

and total them up for one line output like

REQ_RATE REQ_PROCESSING: 0, REQ_PER_SEC: 15.0, TOT_REQS: 5,
REQ_RATE CACHE_HITS_PER_SEC: 2.5, TOTAL_CACHE_HITS: 150

thanks

Upvotes: 1

Views: 1400

Answers (3)

Michael J. Barber
Michael J. Barber

Reputation: 25032

If we consider the structure of the data, there are a few conclusions to be drawn:

  • REQ_RATE carries no information
  • The remainder of the lines can be viewed as key-value pairs
  • The key-value pairs are separated by commas or by line breaks

So take a two step approach, processing the lines into cleaner key-value pairs:

sed -e 's/^REQ_RATE //' -e 's/,[[:space:]]*$//' |
  awk -F ', ' -v OFS='\n' '{ $1=$1; print }'

This produces lines with single key-value pairs.

Now pipe the above through another awk stage, summing up the values for each of the keys using an associative array:

awk -F ': ' '
  { 
    sum[$1] += $2 
  } 
  END { 
    for (k in sum) { 
      printf("%s: %d, ", k, sum[k]) 
    } 
    printf("\n")
  }' 

I've done nothing special with the formatting of the output, instead just printing the keys in whatever arbitrary order they are iterated through. Modify the END action if you need something more specific.

Upvotes: 1

jaypal singh
jaypal singh

Reputation: 77075

Will this work for you -

Your File:

[jaypal:~/Temp] cat file
REQ_RATE REQ_PROCESSING: 0, REQ_PER_SEC: 1.0, TOT_REQS: 2,
REQ_RATE CACHE_HITS_PER_SEC: 0.5, TOTAL_CACHE_HITS: 10
REQ_RATE REQ_PROCESSING: 0, REQ_PER_SEC: 2.0, TOT_REQS: 0,
REQ_RATE CACHE_HITS_PER_SEC: 0.5, TOTAL_CACHE_HITS: 20
REQ_RATE REQ_PROCESSING: 0, REQ_PER_SEC: 3.0, TOT_REQS: 2,
REQ_RATE CACHE_HITS_PER_SEC: 0.5, TOTAL_CACHE_HITS: 30
REQ_RATE REQ_PROCESSING: 0, REQ_PER_SEC: 4.0, TOT_REQS: 1,
REQ_RATE CACHE_HITS_PER_SEC: 0.5, TOTAL_CACHE_HITS: 40
REQ_RATE REQ_PROCESSING: 0, REQ_PER_SEC: 5.0, TOT_REQS: 0,
REQ_RATE CACHE_HITS_PER_SEC: 0.5, TOTAL_CACHE_HITS: 50

Test:

[jaypal:~/Temp] sed '{N;s/\n/ /g'} file |  
awk -F"[:,]" '{a=a+$2;b=b+$4;c=c+$6;d=d+$8;e=e+$10} 
END{printf ("%s: %.1f,%s: %.1f,%s: %.1f,\n%s: %.1f,%s: %.1f\n", $1,a,$3,b,$5,c,$7,d,$9,e)}'
REQ_RATE REQ_PROCESSING: 0.0, REQ_PER_SEC: 15.0, TOT_REQS: 5.0,
REQ_RATE CACHE_HITS_PER_SEC: 2.5, TOTAL_CACHE_HITS: 150.0

Upvotes: 1

kev
kev

Reputation: 161614

awk is really easy to use.

$ awk '/REQ_PROCESSING/{x+=$3; y+=$5; z+=$7}; END{print x, y, z}' input.txt
0 15 5

I think you can do the rest. Happy coding!

Upvotes: 3

Related Questions