Reputation: 15877
This pattern:
/a+?b+?/
Against the following string:
aaaaaabbbbbb
Matches:
aaaaaab
We see that the non-greedy behaves different in backward/left direction (takes all) and forward/right direction (takes just one).
Is there a way to make the non-greedy at the beginning, that matches all the a
, to match as less as possible too? So that it behaves in the same way like at with the b
part at the end?
Upvotes: 13
Views: 8571
Reputation: 25855
Regexes generally match from left-to-right unless you set a right-to-left flag (which very few flavors support). In either case, they do not start in the middle and then work out in both directions, even if you use a lookbehind.
It helps to stop and ask - why does the lazy quantifier exist in the first place? What problem was it meant to solve?
Normal (greedy) quantifiers work by finding a matching pattern of text and then repeatedly matching a sequence of characters until they can match no more. This behavior is usually desired, but you run into problems when you have a very general pattern followed by a very specific pattern where the specific pattern is a subset of the general pattern.
For example, consider the following input:
_abc_END_def_END
And this pattern:
(\w+END)
The intent was to match _abc_
and then END
. The problem is that END
is a subset of \w+
. Using standard "greedy" rules, the \w+
matches as much as possible. So rather than matching _abc_
, it matched _abc_END_def
.
The solution to this scenario is to change the way the quantifier (+
) behaves with the lazy modifier ?
. By changing the expression to \w+?
, the regex engine is forced to match only as much as necessary to satisfy the expression and no more. The expression is satisfied when \w+?
matches _abc_
and END
matches its literal string.
The purpose of the lazy quantifier is not to match a "minimum" number of characters - it is about giving that second pattern, a subset of the first, an opportunity to match.
In your example, b
is not a subset of a
, so there is no need for the lazy quantifier. If you want to match one or more a's, but as few as possible, and one or more b's, but as few as possible, then you'd simply use:
ab
Or, if your a
is a stand-in for some superset which may include b:
[ab]b
For example:
\wb
Both of which would match:
ab
Example:
const input = "aaabbb"
console.log(/ab/.exec(input)[0])
Upvotes: 7
Reputation: 141
If you do not have to ability to do the previously mentioned Right To Left match, then you can simply reverse the string, reverse the regex expression, then reverse the result at the end.
The work is as follows:
Start with aaaaaabbbbbb
Reverse to bbbbbbaaaaaa
Reverse /a+?b+?/ to /b+?a+?/
The resulting Match is bbbbbba
Reverse the resulting match to get abbbbbb
Upvotes: 1
Reputation: 336418
They do behave the same! A lazy quantifier (in this case a lazy +
) tells the regex engine to
+
)Regexes don't match "leftwards" or "backwards", as you seem to imply.
What exactly are you trying to achieve? I guess it's not this simple example - that would be trivial to fix (just make the regex ab
, which is probably not what you're looking for).
Upvotes: -1