ck_
ck_

Reputation: 3403

Why is regex being lazy instead of greedy in this case?

This is a snippet of a complex regex:

/\x87([\xA6-\xBf]|\xA6\xF0\x9F)/x

Why is it stopping and returning \x87\xA6 instead of \x87\xA6\xF0\x9F

when matching against a string containing \x87\xA6\xF0\x9F ?

I thought regex was greedy by default and would try to consume the longest pattern?

Or is that only for the * and + operators?

Is there any way I can force it to look for the longest pattern? Using word boundaries is not an option in this case unfortunately.


eta: apparently it works as desired if I move the shorter pattern to the end

/\x87(\xA6\xF0\x9F|[\xA6-\xBf])/x

is it really that simple and regex is sensitive to order of the pattern?

Upvotes: 2

Views: 44

Answers (1)

ruakh
ruakh

Reputation: 183446

I thought regex was greedy by default and would try to consume the longest pattern?

"Greediness" refers to the preference of the quantifiers (?, *, +, etc.) for repeating more times rather than fewer. That's not exactly the same as consuming the longest substring, though of course it usually works out that way.

The alternation operator | also has a preference: it prefers to match what's before the |, instead of what's after it. You can fix your pattern by writing:

/\x87(\xa6\xF0\x9F|[\xa6-\xbf])/x

Upvotes: 2

Related Questions