Reputation: 47
I want the output to filter out the number of specific lines in a file, so I count both the content that I need and I don't need and do subtraction. But somehow the output is not changing.
Here is my script:
#!/bin/bash
for file in "$1"/*;
do
cat "$file" | while read line;
do
countContent1="$(grep '\(<Content>\)' | wc -l)"
countContent2="$(grep '\(showReview\)' | wc -l)"
valuableReviews="$(($countContent1-$countContent2))"
echo "$(b=${file##*/}; echo ${b%.*})" $valuableReviews
done
done | sort -r -n -k 2
note that both <content>
and showReview
are on the same line in the file. The output is only the number of the line contain <content>
, there's no subtraction.
Here is part of the file:
<Author>lass=
<Content>Empfehlenswert.... showReview(11348491, 'full');
<Date>Sep 28, 2006
<No. Reader>-1
<No. Helpful>-1
<Overall>4
<Value>-1
<Rooms>4
<Location>-1
<Cleanliness>5
<Check in / front desk>-1
<Service>4
<Business service>-1
Upvotes: 0
Views: 2654
Reputation: 2705
:>cat file1.txt
<Author>lass=
<Content>Empfehlenswert.... showReview(11348491, 'full');
<Date>Sep 28, 2006
<No. Reader>-1
<No. Helpful>-1
<Overall>4
<Value>-1
<Rooms>4
<Location>-1
<Cleanliness>5
<Check in / front desk>-1
<Service>4
<Business service>-1
:>echo -e "Lines with content $(grep -c Content file1.txt)\nLines with showReview $(grep -c showReview file1.txt)"
Lines with content 1
Lines with showReview 1
:>
grep -c Content file1.txt -- Count of lines matching pattern
$() --> Run some command
Upvotes: 0
Reputation: 295629
This makes more sense if you take out the inner while read
loop:
#!/bin/bash
for file in "$1"/*; do
countContent1=$(grep -c '[<]Content[>]' <"$file")
countContent2=$(grep -c 'showReview' <"$file")
valuableReviews=$((countContent1 - countContent2))
b=${file##*/}; b=${b%.*}
echo "$b $valuableReviews"
done | sort -r -n -k 2
Note:
"$file"
into each copy of grep
, so grep is counting content in the file instead of content on stdin.while read
loop entirely, and are letting grep
iterate over the individual lines of each file, rather than trying to do that in bash. (Consequently, we now run grep twice per file, not twice per line of each file).$(...)
has a significant performance penalty (lower than running an external command, but still much higher than doing everything in the parent process).It would be still faster to replace the entire program with just one copy of awk:
#!/bin/awk -f
/[<]Content[>]/ {
++allContent
if ($0 ~ /showReview/) {
++valuableReviews
}
}
FILENAME != fn {
if(fn) { print(fn, ": ", (allContent - valuableReviews)); }
allContent = 0; valuableReviews = 0; fn = FILENAME;
}
END {
print(fn, ": ", (allContent - valuableReviews))
}
...called as ./theAwkScript "$1"/*
Upvotes: 1