Reputation: 351

AWK issue : counting "non-matches"

I want to count the occurrences of some words in a file. Then I modify my code to additionally count how many lines did not match to any word.

For example here is my input file (test.txt):

fred
fred
fred
bob
bob
john
BILL
BILL

Here is my code:

awk '
    /fred/ { count["fred"]++ }
    /bob/ { count["bob"]++ }
    /john/ { count["john"]++ }
   END      { for (name in count) print name, "was found on", count[name], "lines." }
   ' test.txt

This works fine and gives me this output:

john was found on 1 lines.
bob was found on 2 lines.
fred was found on 3 lines.

Now I want to get a count of the lines that didn't match so I did the following code:

awk '
    found=0
    /fred/ { count["fred"]++; found=1 }
    /bob/ { count["bob"]++; found=1 }
    /john/ { count["john"]++; found=1 }
    if (found==0) { count["none"]++ }
   END      { for (name in count) print name, "was found on", count[name], "lines." }
   ' test.txt

I get an error on the if statement like this:

awk: syntax error at source line 6
 context is
        >>>  if <<<  (found==0) { count["none"]++; }
awk: bailing out at source line 8

Any ideas why this isn't working?

Upvotes: 1

Answers (3)

thanasisp

Reputation: 5965

You have simple syntax errors about using conditions. This statement is not valid:

awk 'if (found==0) { count["none"]++ }'  # syntax error

because if () it does not form a condition that could exist outside {}. You should use either:

awk '{ if (found==0) count["none"]++ }'

awk 'found==0{ count["none"]++ }'

Also found = 0 at the beginning of your script should be inside {} as it is also a statement. Here are some useful links: Outside and in front of {} can be these patterns and inside {} we have the actions.

Your script with only the necessary modifications could be:

BEGIN { count["fred"]; count["bob"]; count["john"]; count["none"] }
{ found = 0 }
/fred/ { count["fred"]++; found=1 }
/bob/ { count["bob"]++; found=1 }
/john/ { count["john"]++; found=1 }
found==0{ count["none"]++ }
END { for (name in count) print name, "was found on", count[name]+0, "lines." }

two syntax errors corrected.
added items initialisation, because without it, no line will be printed for "fred" if there is no "fred" at all.
added count[name]+0 so if item is empty string, will print zero.

Upvotes: 2

kvantour

Reputation: 26471

There are a couple of ways you can achieve what you want. While the method that the OP presents works, it is not really flexible. We assume you have a string str which contains your words of interest:

awk -v str="fred bob john"                 \
    'BEGIN{split(str,b);for(i in b) a[b[i]]; delete b }
     ($0 in a) {a[$0]++; c++}
     END {for(i in a) print i,"was found",a[i]+0", times
          print NR-c, "lines did not match" }' file1 file2 file3

Upvotes: 2

RavinderSingh13

Reputation: 133458

Could you please try following, considering that you want to print lines which are coming only 1 time. You need NOT to define same variable for each array value because it may give false positive results. So its better to check count value from array's value in condition.

awk '
/fred/{ count["fred"]++ }
/bob/{ count["bob"]++}
/john/{ count["john"]++}
END{
  for(name in count){
     if(count[name]==1){
       print name, "was found only 1 time ", name
     }
  }
}
'  Input_file

NOTE: Also on your syntax error, awk works on method of condition then action so when a condition is true or false, mentioned actions will be performed as per that eg--> /test/{print "something...."}. In your case you are directly mentioning action which is assigning a value to variable which would have been worked if you would have used {found=1} this is just to answer your syntax error part.

Upvotes: 2

AWK issue : counting &quot;non-matches&quot;

Answers (3)

Related Questions

AWK issue : counting "non-matches"