mathk
mathk

Reputation: 8143

Perl regexp /(\r\n|\r|\n)/

I want to know how this ambiguous pattern is solved in perl(more generally everything that use libpcre):

/(\r\n|\r|\n)/

When the pattern sees \r\n will it match one time or twice? And what is the rules face to this situation?

Thanks

Upvotes: 3

Views: 3040

Answers (4)

Alan Moore
Alan Moore

Reputation: 75242

...perl (more generally everything that use libpcre)

Possible misconception here: Perl does not "use libpcre". The PCRE library is a separate project that came along after Perl, and mimics much of Perl's regex functionality. PHP and ActionScript use libpcre, but most "Perl-derived" flavors (like Python, Java, and .NET) implement their regex support natively.

But they all share the trait in question here: they settle for the first alternative that works, rather than hold out for the longest match as a text-directed engine would.

Upvotes: 1

NinjaCat
NinjaCat

Reputation: 10204

It'll match it once. More here: http://technocage.com/~caskey/dos2unix/

Upvotes: 0

bcat
bcat

Reputation: 8941

It will try and match the pipe-separated alternatives in order from left to right. Thus the first alternative will match the entire string "\r\n", and there will only be one match. There's no ambiguity here.

Upvotes: 1

Mark Byers
Mark Byers

Reputation: 838696

It will match \r\n once because Perl uses a regex-directed engine which evaluates alternations eagerly. See here.

You can easily find out whether the regex flavor you intend to use has a text-directed or regex-directed engine. If backreferences and/or lazy quantifiers are available, you can be certain the engine is regex-directed. You can do the test by applying the regex regex|regex not to the string regex not. If the resulting match is only regex, the engine is regex-directed. If the result is regex not, then it is text-directed. The reason behind this is that the regex-directed engine is "eager".

Upvotes: 7

Related Questions