John Smith
John Smith

Reputation: 719

How to sum up numbers in my file?

I have a folder, my_folder, which contains over 800 files, myfile_* where * is the unique ID for each file. In my file I basically have a variety of repeated fields but the one I am interested in is the <rating> field. Lines of this field look like the following: <rating>n where n is the rating score. These lines occur every 14th line, starting at line 10 (10 + 14i) and ending when the file ends. It is my job to write a script, myscript.sh, to sum up all values of n per file in my folder and then sort from highest to smallest. The output would look as follows

myfile_1234 5112
myfile_5214 2134
myfile_6124 1233
...

where the number suffixes are the sum of n per file. My files vary in length dramatically from as little as 20 fields to as many as 2500. How would I go about doing this? I figure that I will use some form of grep command to find occurences of <rating> and then sum up the numbers following the occurences, or maybe could use the fact that the lines occur every 10 + 14i lines, starting at 10. Thanks for your time any suggestions are much appreciated.

Input File:

<Overall Rating>2.5
<Avg. Price>$155
<URL>

<Author>Jeter5
<Content>I hope we're not disappointed! We enjoyed New Orleans...
<Date>Dec 19, 2008
<No. Reader>-1
<No. Helpful>-1
<rating>4
<Value>-1
<Rooms>3
<Location>5
<Cleanliness>3
<Check in / front desk>5
<Service>5
<Business service>5

<Author>...
repeat fields again...

The script must take the folder name as an argument in the command line, such as ./myscript.sh my_folder

Upvotes: 0

Views: 828

Answers (2)

Chris Lear
Chris Lear

Reputation: 6742

Here's my solution:

#/bin/bash
dir=$1

grep -P -o '(?<=<rating>).*' $dir/* |awk -F: '{A[$1]+=$2;next}END{for(i in A){print i,A[i]}}'|sort -n -k2

Looks like the sort at the end wasn't needed, so you could remove that.

Upvotes: 2

ClaudioM
ClaudioM

Reputation: 1446

you could use awk and don't care about the starting line

If I well understood, if you type the following command:

grep rating fileName.txt 

you'll have something like (I've created a sample input file):

grep "<rating>" myfile_12345
<rating>7                                                                                                                                                                                                                                               
<rating>1
<rating>2

you can use this awk

awk -F"<rating>" 'BEGIN{sum=0}{sum+=$2}END{print sum}' myfile_12345

ouput:

10

then you can use it in a for loop

for file in $(find . -name "myfile_*")
do
  printf "%s $file "
  awk -F"<rating>" 'BEGIN{sum=0}{sum+=$2}END{printf " %s\t\n", sum}' $file
done

output:

./myfile_12345  10                                                                                                                                                                                                                                     
./myfile_17676  19                                                                                                                                                                                                                                     
./myfile_9898  24 

Best Regards

Claudio

Upvotes: 2

Related Questions