Reputation: 35169
(Note: the following is using javascript flavored RegExen, in which . does not match newline, but [^] does.)
Imagine I have this text:
chaff more chaff START PATTERN more chaff
chaff more chaff START PATTERN juicy stuff
juicy stuff juicy stuff END PATTERN chaff
chaff START PATTERN more juicy stuff more
juicy stuff END PATTERN
... and I want a RegEx with a global flag (g) that captures the juicy stuff. Specifically, I want the first match to be
START PATTERN juicy stuff
juicy stuff juicy stuff END PATTERN
and the second match to be
START PATTERN more juicy stuff more
juicy stuff END PATTERN
The fly in the ointment is that first START PATTERN. I've spent some time in regex101.com (an awesome tool for those that don't know it), and this one does not work:
/(?:START PATTERN[^]+)?(START PATTERN[^]+END PATTERN)/?
It captures the second group ("more juicy stuff") but not the first. I've also tried various combinations of negative lookahead, but without success.
Ideas?
Upvotes: 3
Views: 59
Reputation: 626853
You need a tempered greedy token:
START PATTERN(?:(?!(?:START|END) PATTERN)[^])*END PATTERN
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo.
The (?:(?!(?:START|END) PATTERN)[^])*
is called a tempered greedy token because the greedy *
quantifier is tempered with a negative lookahead. Inside the lookahead we list all the patterns that we do not want to match up to the trailing delimiter.
Note you can add more precision by adding word boundaries if you plan to match literal words START
and END
:
\bSTART PATTERN\b(?:(?!\b(?:START|END) PATTERN)[^])*\bEND PATTERN
Note that to make it more efficient, we can unroll it:
START PATTERN[^ES]*(?:S(?!TART PATTERN)[^ES]*|E(?!ND PATTERN)[^ES]*)*END PATTERN
See another demo
Upvotes: 2