Reputation: 3403
This is a snippet of a complex regex:
/\x87([\xA6-\xBf]|\xA6\xF0\x9F)/x
Why is it stopping and returning \x87\xA6
instead of \x87\xA6\xF0\x9F
when matching against a string containing \x87\xA6\xF0\x9F
?
I thought regex was greedy by default and would try to consume the longest pattern?
Or is that only for the *
and +
operators?
Is there any way I can force it to look for the longest pattern? Using word boundaries is not an option in this case unfortunately.
eta: apparently it works as desired if I move the shorter pattern to the end
/\x87(\xA6\xF0\x9F|[\xA6-\xBf])/x
is it really that simple and regex is sensitive to order of the pattern?
Upvotes: 2
Views: 44
Reputation: 183446
I thought regex was greedy by default and would try to consume the longest pattern?
"Greediness" refers to the preference of the quantifiers (?
, *
, +
, etc.) for repeating more times rather than fewer. That's not exactly the same as consuming the longest substring, though of course it usually works out that way.
The alternation operator |
also has a preference: it prefers to match what's before the |
, instead of what's after it. You can fix your pattern by writing:
/\x87(\xa6\xF0\x9F|[\xa6-\xbf])/x
Upvotes: 2