Tharanga Abeyseela
Tharanga Abeyseela

Reputation: 3483

Positive/Negative lookahead with grep and perl

My login.txt file contains the following entries:

abc def
abc 123
def abc
abc de
tha ewe

When I do a positive lookahead using perl, I get the following result:

> cat login.txt | perl -ne 'print if /(?)abc\s(?=def)/'
abc def

...when I use grep, I get the following result:

> cat login.txt | grep -P '(?<=abc)\s(?=def)'
abc def

Negative lookahed results as follows from perl...:

> cat login | perl -ne 'print if /(?)abc\s(?!def)/'
abc 123
def abc
abc de

...and the grep result:

> cat login.txt | grep -P '(?<=abc)\s(?!def)'
abc 123
abc de

perl matched the def abc for the negative lookahead. but it shouldn't have matched def abc, as I'm checking abc then def pattern; whereas grep returns the correct result.

Is something missing in my perl pattern?

Upvotes: 18

Views: 36739

Answers (4)

Vinny
Vinny

Reputation: 174

perl -ne 'print if /(?)abc\s(?!def)/'

To begin, as fugi stated, the (?) is an empty non-capturing group, and matches anything, so it does nothing.

Therefore as written, this regex matches the literal string abc followed by a single [:space:OR:tab:OR:newline], not followed by the literal string def.

Because \s matches a newline character and you did not chomp the trailing newline characters as you processed each line, def abc matches because (?)abc\s in the regex matches abc[:newline:] which is followed by $ (the end-of-line anchor, not def).

The corrected regex (accounting for the redundant (?)) would be:

perl -ne 'print if /(?<=abc)\s(?!def)/'

...which matches a single [:space:OR:tab:OR:newline] which is preceded by abc and not followed by def.

This still will match def abc, because once again, \s matches the [:newline:], which is preceded by abc and followed by $ (the end-of-line anchor, not def).

Either chomp the [:newline:] before evaluating the regex in Perl, or use the character class [ \t] (if you need to account for tab characters) instead of \s:

perl -ne 'print if /(?<=abc)[ \t](?!def)/'

Or simply

perl -ne 'print if /(?<=abc) (?!def)/'

Upvotes: 0

ysth
ysth

Reputation: 98398

grep does not include the newline in the string it checks against the regex, so abc\s does not match when abc is at the end of the line. chomp in perl or use the -l command line option and you will see similar results.

I'm not sure why you were making other changes between the perl and grep regexes; what was the (?) supposed to accomplish?

Upvotes: 9

Oleg G
Oleg G

Reputation: 945

In your perl -ne 'print if /(?)abc\s(?!def)/' you asking perl to find abc, then space, then string shouldn't be def. This is successfully matches with def abc, because there is no def after abc here and \s matches with newline.

Upvotes: 2

fugu
fugu

Reputation: 6578

I would try anchoring your regex like so:

/(^abc\s+(?!def).+)/

This would capture:

abc 123
abc de

The (?) at the beginning of your negative lookahead regex is redundant

Upvotes: 4

Related Questions