user15964
user15964

Reputation: 2639

regex: Why this negative lookahead doesn't work?

I have text like this

real:: a
real :: b
real c

now I want to match the those real without :: followed, and in this case, I want to match only the 3rd real. so I tried regex with lookahead

real\s*(?!::)

But this matches

real :: b
real c

For \s* means zero or more \s, why real :: b is being matched?

update

Thanks to Wiktor Stribiżew. Using regex101 debugging tool. We can find backtrack makes thing complicated.

I came up with another task that is similar but I can't solve

real (xx(yy)) :: a
real (zz(pp)):: b
real (cc(rr)) c

again, I want to match real (cc(rr)) which is without :: following.

real\s*\(.*?\)+(?!\s*::)

This is what I tried, but failed. Look into regex debug, it is also due to backtrack. But how to do this correctly?

Upvotes: 7

Views: 10802

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626690

You need to put the \s* into the lookahead:

real(?!\s*::)

See the regex demo

The real\s*(?!::) matches real because the real matches real, then the \s* matches 0 or more whitespaces, then the lookahead fails the match at the :: and the engine backtracks, that is, it frees the space matched with \s* and tries to re-match the string. Since the \s* can match an empty string, the real before :: b gets matched.

See the regex debugger scheme at regex101 showing what is going on behind the scenes:

enter image description here

Upvotes: 11

Related Questions