Reputation: 241890
Consider the following test data:
x.foo,x.bar
y.foo,y.bar
yy.foo,yy.bar
x.foo,y.bar
y.foo,x.bar
yy.foo,x.bar
x.foo,yy.bar
yy.foo,y.bar
y.foo,yy.bar
I'm attempting to write a regular expression where the string before .foo
and the string before .bar
are different from each other. The first three items should not match. The other six should.
This mostly works:
^(.+?)\.foo,(?!\1)(.+?)\.bar$
However, it misses on the last one, because y
is in match group 1, and thus yy
is not matched in match group 2.
Interactive: https://regex101.com/r/Pv5062/1
How can I modify the negative lookahead pattern such that the last item matches as well?
Upvotes: 2
Views: 77
Reputation: 627469
Inline backreferences do not store the context information, they only keep the text captured. You need to specify the context yourself.
You may add a dot after \1
:
^(.+?)\.foo,(?!\1\.)(.+?)\.bar$
^^
Or, even repeat the part after the second (.+?)
:
^(.+?)\.foo,(?!\1\.bar$)(.+?)\.bar$
Or, if the bar
part cannot contain .
, you may make it more "generic":
^(.+?)\.foo,(?!\1\.[^.]+$)(.+?)\.bar$
See the regex demo and another regex demo.
The point is: your (?!\1)
is not "anchored" and will fail the match in case the text stored in Group 1 appears immediately to the right of the current location regardless of the context. To solve this, you need to provide this context. As the value that can be matched with .+?
can contain virtually anything all you can rely on is the "hardcoded" bits after the lookahead.
Upvotes: 2