Reputation: 21
I have noticed some strange behaviour with a PCRE regular expression I can't explain. I would expect the code:
preg_match('!^.+?(?:/programs/([^?#]+))?.*?$!',
'http://example.com/programs/drive', $matches);
to return "drive" as match 1. The [^?#]+
and the ?
after the non-capturing group are both greedy so why doesn't the [^?#]+
take precedence and match drive
? Instead testing revealed that the .+?
at the start matches the h
and the .*?
at the end matches the rest of the URL.
By contrast, the code:
preg_match('!^.+?(?:/programs/([^?#]+).*)?$!',
'http://example.com/programs/drive', $matches);
works as expected and returns drive
as match 1.
Upvotes: 2
Views: 88
Reputation: 34385
Whats happining is this. The first .+?
is applied at the start of the string before the h
in http
. This is lazy so it gives up right off the bat and the (?:/programs/([^?#]+).*)?
is tested against the h
. This whole expression is optional so it, too, gives up after failing to match at the start of the string. Finally, the .*?$
at the end of the pattern is applied, and this expression is able to match all the characters in the string for a successful match.
Upvotes: 3