Reputation: 666
The idea of lazy and greedy are easy to understand, but I have only seen/used *+
once in my regex (in Java) [A]|[^B]*+(?!C)
(A,B,C are arbitrary values) simply because it worked when the lazy modifier resulted in a StackOverflow error.
Because of most search engines' inability to search symbols, I can't find any documentation on this. So what exactly does *+ do and how does it do it?
Upvotes: 4
Views: 1044
Reputation: 30445
As the other answers point out, *+
is a "possessive quantifier" It matches the previous element as many times as possible, just like a greedy quantifier, but never backtracks.
Why is this useful? Only as a performance optimization. Further, only as a performance optimization when the regex doesn't match. This is an important point to understand about regular expressions: their worst-case performance always occurs when they don't match.
Depending on the regex engine in use, and the details of the regex itself, worst-case performance can sometimes be stunningly bad. For a simple example, take this regex: a*a*a*b
, matched against this string: aaaaac
.
Confronted with this situation, a standard "NFA" type regex engine will do something like this:
a
5 times, the 2nd a
zero times, and the 3rd a
zero times.b
against the c
-- it fails.a
4 times, the 2nd 1 time, and the 3rd zero times.b
again -- it fails.a
4 times, the 2nd zero times, and the 3rd 1 time.a
3 times, the 2nd 2 times, and the 3rd zero times...(I think you can fill out the next few hundred steps by yourself.)
If the regex was a*+a*a*b
, that would never happen. It would be more like:
a
5 times, the 2nd zero times, and the 3rd zero times.b
-- it fails.a
is "possessive", once it has matched 5 times, it can never try matching a smaller number of times. And since there are no a
s left in the string for the other a*
s to match, they can only match zero times. There is nothing left to try, so the match as a whole fails.Upvotes: 2
Reputation: 1165
X*+ means X, zero or more times (Possessive)
The possessive quantifiers always eat the entire input string, trying once (and only once) for a match. Unlike the greedy quantifiers, possessive quantifiers never back off, even if doing so would allow the overall match to succeed.
Upvotes: 1
Reputation: 61138
A greedy quantifier matches everything it can and then the pattern backtracks until the match succeeds.
A lazy quantifier forward tracks until the match succeeds.
A possessive quantifier matches everything it can and never backtracks.
The +
denotes a possessive quantifier. If can be used as, for example, ++
or *+
.
This ability to prevent backtracking means it can stop catastrophic backtracking.
Upvotes: 7