Veger
Veger

Reputation: 37905

preg_replace speed optimisation

When looking at the accepted answer of stripping out all characters from a string, leaving numbers, the author added a + after the expression

$str = preg_replace('/[^0-9.]+/', '', $str);

in order to find sub-strings, instead of single occurrences, to remove. For the functionality the + is optional. But I started to wonder whether adding the + is faster or not. (Or is there not any difference?)

I would assume it is faster, due to less string and memory handling. But I could also understand that more complex regex expressions are slower than simple ones.

So when using this technique to remove sub-strings should one try to find large or small sub-strings?

Upvotes: 2

Views: 5842

Answers (3)

Veger
Veger

Reputation: 37905

I ran some speeds tests as chris suggested. Compared to his code I:

  • added a str_replace for comparison:
$str_replace_array = array('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '.');

function tst($pat, $str) {
    global $str_replace_array;
    $start = microtime(true);
    if($pat == '')
        str_replace($str_replace_array, '', $str);
    else
        preg_replace($pat, '', $str);
    return microtime(true) - $start;
}
  • made all strings the same length, so the results could be compared better

The results in:

letters
         rep    0.00298
       norep    0.06953
 str_replace    0.00406

numbers
         rep    0.02867
       norep    0.02612
 str_replace    0.01242

mostly_letters
         rep    0.00931
       norep    0.06649
 str_replace    0.00593

mostly_numbers
         rep    0.03285
       norep    0.02942
 str_replace    0.01359

It shows that the repeating regex (with the + added) is much faster when replacing larger blocks (less memory handling?) But no repeating regex is slightly faster when not much needs to be replaced.

Furthermore, str_replace is basically always faster (twice the speed) than the regex replacement, except when a regex matches the complete string.

Upvotes: 1

goat
goat

Reputation: 31813

Don't read too much into benchmark results. They're incredibly hard to do well. Really, the only thing you should take from this is that the repetition might be faster on certain types of strings, where the span of repetition is long.

This type of stuff that can easily change with a different version of PCRE.


function tst($pat, $str) {
    $start = microtime(true);
    preg_replace($pat, '', $str);
    return microtime(true) - $start;
}
$strs = array(
    'letters' => str_repeat("a", 20000),
    'numbers' => str_repeat("1", 20000),
    'mostly_letters' => str_repeat("aaaaaaaaaaaaa5", 20000),
    'mostly_numbers' => str_repeat("5555555555555a", 20000)
);
$pats = array(
    'rep' => '/[^0-9.]+/',
    'norep' => '/[^0-9.]/'
);

//precompile patterns(php caches them per script) and warm up microtime
microtime(true);
preg_replace($pats['rep'], '', 'foo');
preg_replace($pats['norep'], '', 'foo');

foreach ($strs as $strname => $str) {
    echo "$strname\n";
    foreach ($pats as $patname => $pat) {
        printf("%10s    %.5f\n", $patname, tst($pat, $str));
    }
}

Upvotes: 1

mck89
mck89

Reputation: 19231

I haven't done any test but with the + you match more characters so the replace process should be executed less times. If you don't write the + in the regexp the replacement is done on every character instead of replace an entire substring, so i think it's slower.

Upvotes: 0

Related Questions