limeandcoconut
limeandcoconut

Reputation: 408

Regex lazy quantifier asserted beginning and end

When quantifying a regex as lazy, such as \w{2,4}?, is it any more or less efficient than otherwise (greedy) when asserted as the beginning and end of the test?

e.g. ^\w{2,4}?$ vs ^\w{2,4}$ testing '12345' '1234'

Upvotes: 3

Views: 541

Answers (2)

Sanya_Zol
Sanya_Zol

Reputation: 498

No. Both greedy and lazy quantifiers create "backtracks".

However, there is possessive quantifier "+" which does not create backtracks, and, therefore, faster in some cases. It does have a very specific behavior and there is many regex engines that does not support possessive quantifier (javascript does not, PCRE does)

As Avinash Raj mentoined, backtracks will not be used in this specific example, and all those expressions

^\w{2,5}?$ ^\w{2,5}$ ^\w{2,5}+$ ^\w{2,4}?$ ^\w{2,4}$ ^\w{2,4}+$

will have no difference in time matching string 12345

see Avinash Raj's answer answer.

<?php header('Content-Type: text/plain');
$s = str_repeat("12345\n",1000);
function a($r){
    global $s;
    $r = '#'.$r.'#m';
    $time_a = (double)microtime(true);
    for($i=0;$i<100000;$i++)preg_match( $r, $s );
    echo "$r\t".( (double)microtime(true) - $time_a )."\n";
}
a('^\w{2,5}$');  // 0.093012094497681 ; 0.10101294517517 
a('^\w{2,5}?$'); // 0.095011949539185 ; 0.09201192855835 
a('^\w{2,5}+$'); // 0.094011783599854 ; 0.093512058258057
a('^\w{2,4}$');  // 4.3395512104034   ; 4.3435509204865    
a('^\w{2,4}?$'); // 4.3420507907867   ; 4.3610541820526    
a('^\w{2,4}+$'); // 4.3565537929535   ; 4.353052854538     
// lazy "?" quantifier is slower than greedy

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174844

Both won't match your test string because your input contains 5 characters where your regex only matches the string which has 2 to 4 characters. This ? won't be needed when anchors are used. ^.*?$ does the same as ^.*$. And i suggest you to use ^\w{2,4}$ instead of ^\w{2,4}?$. The second one takes 8 steps to find the match where the first one would take 5 steps to find a match.

DEMO

Let us consider 1234 as a sample string.

  1. ^\w{2,4}$

    From the start, the above regex would match all the chars which ranges from 2 to 4 characters greedily and the $ anchor asserts that we are at the end.

  2. ^\w{2,4}?$

    But when we use this, it matches upto the two characters and checks for end of the line. Because there isn't an end of the line anchor next to the second character, it matches the third character and again check for the end of the line anchor next to that third character. Result is no, so it moves to the next character and matches it. Now it check for the end of the line anchor. Yes, it's an end of the line. If it has any character following the fourth character then it discards it's operation. This is the reason for taking more number of steps than the greedy one.

Upvotes: 3

Related Questions