Reputation: 17
I'm trying to find the mean of several numbers in a file, which contains "< Overall >" on the line.
My code:
awk -v file=$file '{if ($1~"<Overall>") {rating+=$1; count++;}} {rating=rating/count; print file, rating;}}' $file | sed 's/<Overall>//'
I'm getting
awk: cmd. line:1: (FILENAME=[file] FNR=1) fatal: division by zero attempted
for every file. I can't see why count would be zero if the file does contain a line such as "< Overall >5"
EDIT: Sample from the (very large) input file, as requested:
<Author>RW53
<Content>Location! Location? view from room of nearby freeway
<Date>Dec 26, 2008
<No. Reader>-1
<No. Helpful>-1
<Overall>3
<Value>4
<Rooms>3
<Location>2
<Cleanliness>4
<Check in / front desk>3
<Service>-1
<Business service>-1
Expected output:
[filename] X
Where X is the average of all the lines containing < Overall >
Upvotes: 0
Views: 2530
Reputation: 530950
You aren't waiting until you've completely read the file to compute the average rating. This is simpler if you use patterns rather than an if
statement. You also need to remove <Overall>
before you attempt to increment rating
.
awk '$1 ~ /<Overall>/ {rating+=sub("<Overall>", "", $1); count++;}
END {rating=rating/(count?count:1); print FILENAME, rating;}' "$file"
(Answer has been updated to fix a typo in the call to sub
and to correctly avoid dividing by 0.)
Upvotes: 1
Reputation: 10039
awk -F '>' '
# separator of field if the >
# for line that containt <Overall>
/<Overall>/ {
# evaluate the sum and increment counter
Rate+=$2;Count++}
# at end of the current file
END{
# print the average.
printf( "[%s] %f\n", FILENAME, Rate / ( Count + ( ! Count ) )
}
' ${File}
# one liner
awk -F '>' '/<Overall>/{r+=$2;c++}END{printf("[%s] %f\n",FILENAME,r/(c+(!c))}' ${File}
Note:
( c + ( ! c ) )
use a side effect of logical NOT (!
). It value 1 if c = 0, 0 otherwise. So if c = 0 it add 1, if not it add 0 to itself insurring a division value of at least 1.Upvotes: 0
Reputation: 85550
Use an Awk
as below,
awk -F'<Overall>' 'NF==2 {sum+=$2; count++}
END{printf "[%s] %s\n",FILENAME,(count?sum/count:0)}' file
For an input file containing two <Overall>
clauses like this, it produces a result as follows the file-name being input-file
<Author>RW53
<Content>Location! Location? view from room of nearby freeway
<Date>Dec 26, 2008
<No. Reader>-1
<No. Helpful>-1
<Overall>3
<Value>4
<Rooms>3
<Location>2
<Cleanliness>4
<Check in / front desk>3
<Service>-1
<Business service>-1
<Overall>2
Running it produces,
[input-file] 2.5
The part, -F'<Overall>'
splits input-lines with de-limiter as <Overall>
, basically only the lines having <Overall>
and the number after it will be filtered, the number being $2
which is summed up and stored in sum
variable and count is tracked in c
.
The END
clause gets executed after all lines are printed which basically prints the filename using the awk
special variable FILENAME
which retains the name of the file processed and the average is calculated iff the count is not zero.
Upvotes: 4