einy
einy

Reputation: 33

Hint on using a regex lookaround needed

Suppose there is a sequence ...a...b...a...b...c... ,where a,b and c are string constants. And dots represent arbitrary number of some other (not a, b or c) symbols between a,b and c.

I would like to match "a...b...c" a laziest way (the last a-b-c triad), but the regexp engine grabs the bigger "a...b...a...b...c" and goes further.

I tried to use a negative lookahead in forms of a\w+b\w+(?!a)\w+c or a\w+b\w+?(?!a)\w+?c etc. but didn't succeed so far.

Upvotes: 2

Views: 102

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627292

You need to use tempered greedy tokens here in between the three parts.

Imagine a is abc, b is bff and c is cca. Then, you'd use

(?s)abc(?:(?!abc).)*?bff(?:(?!abc).)*?cca

See the regex demo

Details:

  • (?s) - same as re.S or re.DOTALL modifier, makes . match newlines
  • abc - a literal string abc or some pattern #1
  • (?:(?!abc).)*? - any 0+ chars other than newline that does not start an abc sequence, as few as possible
  • bff - a literal string bff or some pattern #2
  • (?:(?!abc).)*? - see above
  • cca - a literal string cca or some pattern #3

Upvotes: 1

Barmar
Barmar

Reputation: 782130

There's no need to use looakarounds for this. Put .* at the beginning of the regexp, and put what you want to match into a capture group:

.*(a.*b.*c)

Then use .group(1) to get the contents of the capture group.

The greedy .* at the beginning makes this find the last triad.

DEMO

Upvotes: 0

Related Questions