tonytz
tonytz

Reputation: 135

question mark in regular expression

I saw this regular expression performed on an url:

$url = 'http://www.domain.com/';
preg_match('/(http)(.*?)\n/', $url, $matches);

I am not sure what the use of the question mark "?" is in this regex expression. According to regex manuals, the "?" is a meta character that is equivalent to {0,1}. Then, what is the point of having "?" after an * since * already represents {0,}

Can someone please enlighten me. Thanks.

Upvotes: 0

Views: 1168

Answers (2)

stema
stema

Reputation: 92986

It has a different meaning when it follows another quantifier.

In this case it changes the matching behaviour of the preceding quantifier. The default behaviour is greedy and the the ? changes it to "ungreedy".

  • "Greedy" means match as much as possible

  • "Ungreedy" means match as less as possible

See the article on regular-expression.info

For example:

a.+b will match "aabxb" in aabxb

a.+?b will match only "aab" in aabxb

See the example here on Regexr

You may be interested in my blog post about this topic: You do know Quantifiers. Really?

About your regex

preg_match('/(http)(.*?)\n/', $url, $matches);

I don't think it makes a difference here. The . matches anything but newline characters by default (you can change this by adding a s after the closing regex delimiter), so if the question mark is there or not, it will match only till the first \n.

If you change the behaviour by using preg_match('/(http)(.*?)\n/s', $url, $matches);, it will make a difference. .*\n would match till the last \n and .*?\n will stop at the first \n.

Upvotes: 6

Dan Dascalescu
Dan Dascalescu

Reputation: 152095

In this case, the question mark means a "stingy" match. It will stop matching as soon as the first \n is encountered, while otherwise, it would gobble up intervening \ns until the last.

More about greedy and stingy matching at http://www.perl.com/doc/FMTEYEWTK/regexps.html

Upvotes: 1

Related Questions