Reputation: 351
I want to count the occurrences of some words in a file. Then I modify my code to additionally count how many lines did not match to any word.
For example here is my input file (test.txt):
fred
fred
fred
bob
bob
john
BILL
BILL
Here is my code:
awk '
/fred/ { count["fred"]++ }
/bob/ { count["bob"]++ }
/john/ { count["john"]++ }
END { for (name in count) print name, "was found on", count[name], "lines." }
' test.txt
This works fine and gives me this output:
john was found on 1 lines.
bob was found on 2 lines.
fred was found on 3 lines.
Now I want to get a count of the lines that didn't match so I did the following code:
awk '
found=0
/fred/ { count["fred"]++; found=1 }
/bob/ { count["bob"]++; found=1 }
/john/ { count["john"]++; found=1 }
if (found==0) { count["none"]++ }
END { for (name in count) print name, "was found on", count[name], "lines." }
' test.txt
I get an error on the if statement like this:
awk: syntax error at source line 6
context is
>>> if <<< (found==0) { count["none"]++; }
awk: bailing out at source line 8
Any ideas why this isn't working?
Upvotes: 1
Views: 193
Reputation: 5965
You have simple syntax errors about using conditions. This statement is not valid:
awk 'if (found==0) { count["none"]++ }' # syntax error
because if ()
it does not form a condition that could exist outside {}
. You should use either:
awk '{ if (found==0) count["none"]++ }'
or
awk 'found==0{ count["none"]++ }'
Also found = 0
at the beginning of your script should be inside {}
as it is also a statement. Here are some useful links: Outside and in front of {}
can be these patterns and inside {}
we have the actions.
Your script with only the necessary modifications could be:
BEGIN { count["fred"]; count["bob"]; count["john"]; count["none"] }
{ found = 0 }
/fred/ { count["fred"]++; found=1 }
/bob/ { count["bob"]++; found=1 }
/john/ { count["john"]++; found=1 }
found==0{ count["none"]++ }
END { for (name in count) print name, "was found on", count[name]+0, "lines." }
count[name]+0
so if item is empty string, will print zero.Upvotes: 2
Reputation: 26471
There are a couple of ways you can achieve what you want. While the method that the OP presents works, it is not really flexible. We assume you have a string str
which contains your words of interest:
awk -v str="fred bob john" \
'BEGIN{split(str,b);for(i in b) a[b[i]]; delete b }
($0 in a) {a[$0]++; c++}
END {for(i in a) print i,"was found",a[i]+0", times
print NR-c, "lines did not match" }' file1 file2 file3
Upvotes: 2
Reputation: 133458
Could you please try following, considering that you want to print lines which are coming only 1 time. You need NOT to define same variable for each array value because it may give false positive results. So its better to check count value from array's value in condition.
awk '
/fred/{ count["fred"]++ }
/bob/{ count["bob"]++}
/john/{ count["john"]++}
END{
for(name in count){
if(count[name]==1){
print name, "was found only 1 time ", name
}
}
}
' Input_file
NOTE: Also on your syntax error, awk
works on method of condition
then action
so when a condition is true or false, mentioned actions will be performed as per that eg--> /test/{print "something...."}. In your case you are directly mentioning action which is assigning a value to variable which would have been worked if you would have used {found=1}
this is just to answer your syntax error part.
Upvotes: 2