Bruno Negrão Zica
Bruno Negrão Zica

Reputation: 824

Negative lookbehind assertion regex has unexpected result with grep -P

I'm testing the following negated lookbehind assertion and I want to understand the result:

echo "foo foofoo" | grep -Po '(?<!foo)foo'

it prints out

foo
foo
foo

I was expecting that only the two first foo would be printed, 'echo foo foofoo' but not the third one, because my assertion is supposed to mean find 'foo' that is not preceded by a 'foo'.

What am I missing? why is the third foo being matched?

Note: grep -P means interpret the regex as perl compatible regex. grep -o means print out only the matched string. My grep is version 2.5.1.

Upvotes: 5

Views: 597

Answers (2)

Bruno Negr&#227;o Zica
Bruno Negr&#227;o Zica

Reputation: 824

After a big discussion on this issue (that has been moved to the chat) I came to the conclusion that my understanding about the lookbehind negative assertion was correct:

echo "foo foofoo" | grep -Po '(?<!foo)foo'

Should return foo two times.

My version of grep, or the PCRE lib that it was compiled with, is buggy.

Some people tested this command on their machines with different versions of grep and they had different results. Some have seen two foo and others had three foo, like me.

I tested that regex with Perl and I had the expected result, foo two times.

grep man page states that -P option is experimental.

My lesson was: if you want PCRE that really works, use Perl.

Upvotes: 1

Sobrique
Sobrique

Reputation: 53508

I can't reproduce this - running the exact command, I only get two matches.

I'm using GNU grep 2.6.3

However, I find a useful trick for troubleshooting a regex is this - perl allows you to run regex debug:

#!/usr/bin/env perl
use strict;
use warnings;

#dump results
use Data::Dumper;

#set regex indo debug mode
use re 'debug'; 

#iterate __DATA__ below
while ( <DATA> ) {
    #apply regex to current line
    my @matches = m/(?<!foo)(foo)/g;
    print Dumper \@matches;

}    

__DATA__
foo foofoo

This gives us output of:

Compiling REx "(?<!foo)(foo)"
Final program:
   1: UNLESSM[-3] (7)
   3:   EXACT <foo> (5)
   5:   SUCCEED (0)
   6: TAIL (7)
   7: OPEN1 (9)
   9:   EXACT <foo> (11)
  11: CLOSE1 (13)
  13: END (0)
anchored "foo" at 0 (checking anchored) minlen 3 
Matching REx "(?<!foo)(foo)" against "foo foofoo"
Intuit: trying to determine minimum start position...
  doing 'check' fbm scan, [0..10] gave 0
  Found anchored substr "foo" at offset 0 (rx_origin now 0)...
  (multiline anchor test skipped)
Intuit: Successfully guessed: match at offset 0
   0 <> <foo foofoo>         |  1:UNLESSM[-3](7)
   0 <> <foo foofoo>         |  7:OPEN1(9)
   0 <> <foo foofoo>         |  9:EXACT <foo>(11)
   3 <foo> < foofoo>         | 11:CLOSE1(13)
   3 <foo> < foofoo>         | 13:END(0)
Match successful!
Matching REx "(?<!foo)(foo)" against " foofoo"
Intuit: trying to determine minimum start position...
  doing 'check' fbm scan, [3..10] gave 4
  Found anchored substr "foo" at offset 4 (rx_origin now 4)...
  (multiline anchor test skipped)
  try at offset...
Intuit: Successfully guessed: match at offset 4
   4 <foo > <foofoo>         |  1:UNLESSM[-3](7)
   1 <f> <oo foofoo>         |  3:  EXACT <foo>(5)
                                    failed...
   4 <foo > <foofoo>         |  7:OPEN1(9)
   4 <foo > <foofoo>         |  9:EXACT <foo>(11)
   7 <foo foo> <foo>         | 11:CLOSE1(13)
   7 <foo foo> <foo>         | 13:END(0)
Match successful!
Matching REx "(?<!foo)(foo)" against "foo"
Intuit: trying to determine minimum start position...
  doing 'check' fbm scan, [7..10] gave 7
  Found anchored substr "foo" at offset 7 (rx_origin now 7)...
  (multiline anchor test skipped)
Intuit: Successfully guessed: match at offset 7
   7 <foo foo> <foo>         |  1:UNLESSM[-3](7)
   4 <foo > <foofoo>         |  3:  EXACT <foo>(5)
   7 <foo foo> <foo>         |  5:  SUCCEED(0)
                                    subpattern success...
                                  failed...
Match failed

Upvotes: 0

Related Questions