user527388
user527388

Reputation: 21

Nested greedy quantifier not matching

I have noticed some strange behaviour with a PCRE regular expression I can't explain. I would expect the code:

preg_match('!^.+?(?:/programs/([^?#]+))?.*?$!',
    'http://example.com/programs/drive', $matches);

to return "drive" as match 1. The [^?#]+ and the ? after the non-capturing group are both greedy so why doesn't the [^?#]+ take precedence and match drive? Instead testing revealed that the .+? at the start matches the h and the .*? at the end matches the rest of the URL.

By contrast, the code:

preg_match('!^.+?(?:/programs/([^?#]+).*)?$!',
     'http://example.com/programs/drive', $matches);

works as expected and returns drive as match 1.

Upvotes: 2

Views: 88

Answers (1)

ridgerunner
ridgerunner

Reputation: 34385

Whats happining is this. The first .+? is applied at the start of the string before the h in http. This is lazy so it gives up right off the bat and the (?:/programs/([^?#]+).*)? is tested against the h. This whole expression is optional so it, too, gives up after failing to match at the start of the string. Finally, the .*?$ at the end of the pattern is applied, and this expression is able to match all the characters in the string for a successful match.

Upvotes: 3

Related Questions