Reputation: 719
I have a folder, my_folder
, which contains over 800 files, myfile_*
where * is the unique ID for each file. In my file I basically have a variety of repeated fields but the one I am interested in is the <rating>
field. Lines of this field look like the following: <rating>n
where n is the rating score. These lines occur every 14th line, starting at line 10 (10 + 14i) and ending when the file ends. It is my job to write a script, myscript.sh
, to sum up all values of n per file in my folder and then sort from highest to smallest. The output would look as follows
myfile_1234 5112
myfile_5214 2134
myfile_6124 1233
...
where the number suffixes are the sum of n per file. My files vary in length dramatically from as little as 20 fields to as many as 2500. How would I go about doing this? I figure that I will use some form of grep
command to find occurences of <rating>
and then sum up the numbers following the occurences, or maybe could use the fact that the lines occur every 10 + 14i lines, starting at 10. Thanks for your time any suggestions are much appreciated.
Input File:
<Overall Rating>2.5
<Avg. Price>$155
<URL>
<Author>Jeter5
<Content>I hope we're not disappointed! We enjoyed New Orleans...
<Date>Dec 19, 2008
<No. Reader>-1
<No. Helpful>-1
<rating>4
<Value>-1
<Rooms>3
<Location>5
<Cleanliness>3
<Check in / front desk>5
<Service>5
<Business service>5
<Author>...
repeat fields again...
The script must take the folder name as an argument in the command line, such as ./myscript.sh my_folder
Upvotes: 0
Views: 828
Reputation: 6742
Here's my solution:
#/bin/bash
dir=$1
grep -P -o '(?<=<rating>).*' $dir/* |awk -F: '{A[$1]+=$2;next}END{for(i in A){print i,A[i]}}'|sort -n -k2
Looks like the sort at the end wasn't needed, so you could remove that.
Upvotes: 2
Reputation: 1446
you could use awk
and don't care about the starting line
If I well understood, if you type the following command:
grep rating fileName.txt
you'll have something like (I've created a sample input file):
grep "<rating>" myfile_12345
<rating>7
<rating>1
<rating>2
you can use this awk
awk -F"<rating>" 'BEGIN{sum=0}{sum+=$2}END{print sum}' myfile_12345
ouput:
10
then you can use it in a for
loop
for file in $(find . -name "myfile_*")
do
printf "%s $file "
awk -F"<rating>" 'BEGIN{sum=0}{sum+=$2}END{printf " %s\t\n", sum}' $file
done
output:
./myfile_12345 10
./myfile_17676 19
./myfile_9898 24
Best Regards
Claudio
Upvotes: 2