sachinv
sachinv

Reputation: 502

Filter Exact Match with Grep

I have been using grep -w extensively but recently I noticed that it is not solving my problem.

Let say I have a file with the following contents:

$  cat Log.txt
aaa xxx zzz 
bbb xxx aa
cccaaa yy aa
scn-aaa

I want to filter all rows which have the word "aaa" exactly match. That means the words "cccaaa" and "scn-aaa" should not come out.

I tried with grep -w but no luck

$ grep -w "aaa" Log.txt
aaa xxx zzz
scn-aaa

$ grep -w "\<aaa\>" Log.txt
aaa xxx zzz
scn-aaa

I also tried -Fx but did not help.

Please let me know how I can achieve this with grep command.

Note: Each line might have multiple columns which is not fixed.

Upvotes: 1

Views: 1592

Answers (4)

sachinv
sachinv

Reputation: 502

I used many of the suggestions but what worked BEST for me is the following command:

grep -E '(^|\s)'<PATTERN>'($|\s)' <FILENAME>

Below is an example:

$ cat Log.txt
aaa xxx zzz
bbb xxx aa
cccaaa yy aa
scn-aaa

$ i=aaa

$ grep -E '(^|\s)'${i}'($|\s)' Log.txt
aaa xxx zzz

Thanks everyone for your suggestions :-)

Upvotes: 1

Maroun
Maroun

Reputation: 95958

You can try:

grep -P '(?<!\w-)(aaa)(?![\w-])'

it matches aaa that are not followed or preceded by one of a-zA-Z0-9 or a -.

  • ?<! is a negative lookbehind - makes sure that aaa is not preceded by \w-

  • ?! is a negative lookahead - makes sure that aaa is not followed bye \w-

Upvotes: 0

Jotne
Jotne

Reputation: 41456

Can also be done with awk

awk -F"[^[:alnum:]_-]" '{f=0;for (i=1;i<=NF;i++) if ($i=="aaa") f=1}f' file
aaa xxx zzz
cccaaa yy aaa

Here we set that field separator is not any alpha, numeric nor _ and -.
Then test every field, one by one. If one hit is found, print the line.


For some reason, even if we set correct separators, word boundary fails in awk, so do not use this:

awk -F"[^[:alnum:]_-]" '/\<aaa\>/' file
aaa xxx zzz
cccaaa yy aaa
scn-aaa

Upvotes: 1

halfflat
halfflat

Reputation: 1584

grep -w counts '-' as beginning a word boundary, which is why it is catching scn-aaa. In short, you want to do what -w does, but with a different definition of what constitutes a valid word character.

For grep, a word character is [_[:alnum:]], i.e. any letter or number or the underscore character. So we can roll our own grep -w like match by:

grep -E '(^|[^[:alnum:]_-])aaa($|[^[:alnum:]_-])'

That is, match aaa when preceded by and followed by the beginning or end of the string, or a non-word character, where we count '-' also as being a word character.

Upvotes: 3

Related Questions