Connor Clark
Connor Clark

Reputation: 55

PCRE regex behaves differently when moved to subroutine

Using PCRE v8.42, I am trying to abstract a regex into a named subroutine, but when it's in a subroutine, it seems to behave differently.

This outputs 10/:

echo '10/' | pcregrep '(?:0?[1-9]|1[0-2])\/' 

This outputs nothing:

echo '10/' | pcregrep '(?(DEFINE)(?<MONTHNUM>(?:0?[1-9]|1[0-2])))(?&MONTHNUM)\/'

Are these two regular expressions not equivalent?

Upvotes: 2

Views: 41

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627469

In versions of PCRE2 prior to 10.30, all subroutine calls are always treated as atomic groups. Your (?(DEFINE)(?<MONTHNUM>(?:0?[1-9]|1[0-2])))(?&MONTHNUM)\/ regex is actually equal to (?>0?[1-9]|1[0-2])\/. See this regex demo, where 10/ does not match as expected.

There is no match because 0?[1-9] matched the 1 in 10/ and since there is no backtracking allowed, the second alternative was not tested ("entered"), and the whole match failed as there is no / after 1.

You need to make sure the longer alternative comes first:

(?(DEFINE)(?<MONTHNUM>(?:1[0-2]|0?[1-9])))(?&MONTHNUM)/

See the regex demo. Note that in the pcregrep pattern, you do not need to escape /.

Alternatively, you can use PCRE2 v10.30 or newer.

Upvotes: 1

Related Questions