Reputation: 37905
When looking at the accepted answer of stripping out all characters from a string, leaving numbers, the author added a +
after the expression
$str = preg_replace('/[^0-9.]+/', '', $str);
in order to find sub-strings, instead of single occurrences, to remove. For the functionality the +
is optional. But I started to wonder whether adding the +
is faster or not. (Or is there not any difference?)
I would assume it is faster, due to less string and memory handling. But I could also understand that more complex regex expressions are slower than simple ones.
So when using this technique to remove sub-strings should one try to find large or small sub-strings?
Upvotes: 2
Views: 5842
Reputation: 37905
I ran some speeds tests as chris suggested. Compared to his code I:
$str_replace_array = array('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '.'); function tst($pat, $str) { global $str_replace_array; $start = microtime(true); if($pat == '') str_replace($str_replace_array, '', $str); else preg_replace($pat, '', $str); return microtime(true) - $start; }
The results in:
letters
rep 0.00298
norep 0.06953
str_replace 0.00406
numbers
rep 0.02867
norep 0.02612
str_replace 0.01242
mostly_letters
rep 0.00931
norep 0.06649
str_replace 0.00593
mostly_numbers
rep 0.03285
norep 0.02942
str_replace 0.01359
It shows that the repeating regex (with the +
added) is much faster when replacing larger blocks (less memory handling?) But no repeating regex is slightly faster when not much needs to be replaced.
Furthermore, str_replace is basically always faster (twice the speed) than the regex replacement, except when a regex matches the complete string.
Upvotes: 1
Reputation: 31813
Don't read too much into benchmark results. They're incredibly hard to do well. Really, the only thing you should take from this is that the repetition might be faster on certain types of strings, where the span of repetition is long.
This type of stuff that can easily change with a different version of PCRE.
function tst($pat, $str) { $start = microtime(true); preg_replace($pat, '', $str); return microtime(true) - $start; } $strs = array( 'letters' => str_repeat("a", 20000), 'numbers' => str_repeat("1", 20000), 'mostly_letters' => str_repeat("aaaaaaaaaaaaa5", 20000), 'mostly_numbers' => str_repeat("5555555555555a", 20000) ); $pats = array( 'rep' => '/[^0-9.]+/', 'norep' => '/[^0-9.]/' ); //precompile patterns(php caches them per script) and warm up microtime microtime(true); preg_replace($pats['rep'], '', 'foo'); preg_replace($pats['norep'], '', 'foo'); foreach ($strs as $strname => $str) { echo "$strname\n"; foreach ($pats as $patname => $pat) { printf("%10s %.5f\n", $patname, tst($pat, $str)); } }
Upvotes: 1
Reputation: 19231
I haven't done any test but with the + you match more characters so the replace process should be executed less times. If you don't write the + in the regexp the replacement is done on every character instead of replace an entire substring, so i think it's slower.
Upvotes: 0