Rara
Rara

Reputation: 47

How to subtract the number of the wc -l output in bash script?

I want the output to filter out the number of specific lines in a file, so I count both the content that I need and I don't need and do subtraction. But somehow the output is not changing.

Here is my script:

#!/bin/bash

for file in "$1"/*;
do
    cat "$file" | while read line;
do
    countContent1="$(grep '\(<Content>\)' | wc -l)"
    countContent2="$(grep '\(showReview\)' | wc -l)"
    valuableReviews="$(($countContent1-$countContent2))"
    echo "$(b=${file##*/}; echo ${b%.*})" $valuableReviews
done
done | sort -r -n -k 2

note that both <content> and showReview are on the same line in the file. The output is only the number of the line contain <content>, there's no subtraction.

Here is part of the file:

<Author>lass=
<Content>Empfehlenswert....   showReview(11348491, 'full');  
<Date>Sep 28, 2006
<No. Reader>-1
<No. Helpful>-1
<Overall>4
<Value>-1
<Rooms>4
<Location>-1
<Cleanliness>5
<Check in / front desk>-1
<Service>4
<Business service>-1

Upvotes: 0

Views: 2654

Answers (2)

Digvijay S
Digvijay S

Reputation: 2705

    :>cat file1.txt
    <Author>lass=
    <Content>Empfehlenswert....   showReview(11348491, 'full');
    <Date>Sep 28, 2006
    <No. Reader>-1
    <No. Helpful>-1
    <Overall>4
    <Value>-1
    <Rooms>4
    <Location>-1
    <Cleanliness>5
    <Check in / front desk>-1
    <Service>4
    <Business service>-1

    :>echo -e  "Lines with content $(grep -c Content file1.txt)\nLines with showReview $(grep -c showReview file1.txt)"
    Lines with content 1
    Lines with showReview 1
    :>
grep -c Content file1.txt -- Count of lines matching pattern 
$() --> Run some command 

Upvotes: 0

Charles Duffy
Charles Duffy

Reputation: 295629

This makes more sense if you take out the inner while read loop:

#!/bin/bash

for file in "$1"/*; do
    countContent1=$(grep -c '[<]Content[>]' <"$file")
    countContent2=$(grep -c 'showReview' <"$file")
    valuableReviews=$((countContent1 - countContent2))
    b=${file##*/}; b=${b%.*}
    echo "$b $valuableReviews"
done | sort -r -n -k 2

Note:

  • We're redirecting "$file" into each copy of grep, so grep is counting content in the file instead of content on stdin.
  • We've removed the while read loop entirely, and are letting grep iterate over the individual lines of each file, rather than trying to do that in bash. (Consequently, we now run grep twice per file, not twice per line of each file).
  • We aren't using command substitutions unnecessarily. $(...) has a significant performance penalty (lower than running an external command, but still much higher than doing everything in the parent process).

It would be still faster to replace the entire program with just one copy of awk:

#!/bin/awk -f

/[<]Content[>]/ {
  ++allContent
  if ($0 ~ /showReview/) {
    ++valuableReviews
  }
}
FILENAME != fn {
  if(fn) { print(fn, ": ", (allContent - valuableReviews)); }
  allContent = 0; valuableReviews = 0; fn = FILENAME;
}
END {
  print(fn, ": ", (allContent - valuableReviews))
}

...called as ./theAwkScript "$1"/*

Upvotes: 1

Related Questions