calculating mean

I have a number of files which each contain reviews of a hotel. For each file I would like to write a script which calculates the mean of all values on lines that start with Overall.

Upvotes: 1

Views: 230

Answers (2)

Lars Fischer
Lars Fischer

Reputation: 10229

Something like

awk -F\> -e '
BEGINFILE {sum=0; count=0}
/<Overall>/ {count++;sum+=$2;}
ENDFILE {printf("%s: %4.1f\n", gensub(".dat","","g",FILENAME),  sum/count);}
' hotel*.dat

will e.g. print hotel1: 8.7

It uses > as delimiter, which nicely gives us the number after <Overall into the variable $2. This will only work as long as the > in <Overall>xyz is the first > on the line containing `.

The pattern /<Overall>/ restricts the summation to lines containing <Overall>.

Upvotes: 2

dawg
dawg

Reputation: 104111

If you are just looking for the digits after <Overall> you can do:

awk -F "<Overall>" 'NF>1{sum+=$2;c+=1} END {print sum/c}' file

prints 2.5 with your example.

If you want the average of all the numeric fields:

awk -F "<|>" '$3~/^-?[0-9.]+$/{a1[$2]+=$3; a2[$2]+=1;} END{ for (e in a1){ print "AVG "e": "a1[e]/a2[e]}}' file

Prints:

AVG Overall: 2.5
AVG Cleanliness: 3
AVG Location: 2.5
AVG Overall Rating: 3.5
AVG Rooms: 2.5
AVG Check in / front desk: 3
AVG Business service: 1.5
AVG No. Reader: 0
AVG Service: 0
AVG Value: 2.5
AVG No. Helpful: 0

Upvotes: 1

Related Questions