Reputation: 33
Suppose there is a sequence ...a...b...a...b...c... ,where a,b and c are string constants. And dots represent arbitrary number of some other (not a, b or c) symbols between a,b and c.
I would like to match "a...b...c" a laziest way (the last a-b-c triad), but the regexp engine grabs the bigger "a...b...a...b...c" and goes further.
I tried to use a negative lookahead in forms of a\w+b\w+(?!a)\w+c
or a\w+b\w+?(?!a)\w+?c
etc. but didn't succeed so far.
Upvotes: 2
Views: 102
Reputation: 627292
You need to use tempered greedy tokens here in between the three parts.
Imagine a
is abc
, b
is bff
and c
is cca
. Then, you'd use
(?s)abc(?:(?!abc).)*?bff(?:(?!abc).)*?cca
See the regex demo
Details:
(?s)
- same as re.S
or re.DOTALL
modifier, makes .
match newlinesabc
- a literal string abc
or some pattern #1(?:(?!abc).)*?
- any 0+ chars other than newline that does not start an abc
sequence, as few as possiblebff
- a literal string bff
or some pattern #2(?:(?!abc).)*?
- see abovecca
- a literal string cca
or some pattern #3Upvotes: 1
Reputation: 782130
There's no need to use looakarounds for this. Put .*
at the beginning of the regexp, and put what you want to match into a capture group:
.*(a.*b.*c)
Then use .group(1)
to get the contents of the capture group.
The greedy .*
at the beginning makes this find the last triad.
Upvotes: 0