Reputation: 815
Say I had the line
"The quick brown fox jumps over the lazy dog"
and I wanted to grab everything between "brown" and "over", where the boundary words may also be substrings of other words. So I am trying to tell the RegEx something like
"grab everything in this line beginning at the string brown
until you find the string over
"
So I did
brown[^("over")]*
but the result is brown f
, because "fox" contains an "o" which is contained in "over".
I just couldn't find a solution to this and the so I hope you can help.
Upvotes: 2
Views: 311
Reputation: 626691
Alroght, to match really anything between 2 substrings (where the trailing part must be the left-most match, i.e. closest to the leading substring) can be achieved best with the help of a unrolling-the-loop method that invloves the use of negated character classes (sometimes, with a look-ahead).
Here is one for your case:
\bbrown\b[^o]*(?:o(?!ver\b)[^o]*)*\bover\b
See the regex demo
Note that basically this expression is synonymic to (?s)\bbrown\b.*?\bover\b
where .*?
matches 0 or more any characters, but as few as possible to return a valid match. However, it involves much less backtracking since it is linear.
The unrolled lazy matching is turned into [^o]*(?:o(?!ver\b)[^o]*)*
here. Negated character class [^o]
matches any character but o
. Thus, we do not have to worry about matching newlines.
The \b
word boundaries help match whole words only. If you need no whole word matching, just remove all \b
from the pattern.
Here is my regex breakdown:
\bbrown\b
- matches brown
as a whole word[^o]*
- 0 or more characters other than o
(?:o(?!ver\b)[^o]*)*
- 0 or more sequences of o
that is not followed by ver
((?!ver\b)
) and followed by 0 or more characters other than o
([^o]*
)\bover\b
- matches a whole word over
.Upvotes: 2