Reputation: 19539
I was looking at some code and started thinking about the most efficient way to truncate a string (in this case, a URI) using preg_replace
.
First off - I realize that using preg_replace
in the first place might be overkill for this task, that it may be needlessly expensive, and that it might better be handled using PHP's string-friendly functions such as substr
. I do know this.
That said, consider these two different Regular Expressions:
$uri = '/one/cool/uri'; // Desired result '/one/cool'
// Using a back-reference
$parent = preg_replace('#(.*)/.*#', "$1", $uri);
// Using character class negation
$parent = preg_replace('#/[^/]+$#', '', $uri);
By default I would assume that in the former case, creating the back-reference is going to be more expensive than not doing so, and therefor the 2nd example would be preferable. But then I started wondering if using [^/]
in the 2nd example might be more expensive than the corresponding .
in the first example, and if so, how much more?
I prefer the first example from a readability standpoint, and since we're splitting hairs I lean towards choosing it between the two (after all, there's value in writing readable code too). May just be my personal preference though.
Thoughts?
Upvotes: 4
Views: 340
Reputation: 10015
I also would measure running time of both options. This information from the docs may help too:
http://www.php.net/manual/en/regexp.reference.performance.php
If you are using such a pattern with subject strings that do not contain newlines, the best performance is obtained by setting PCRE_DOTALL, or starting the pattern with ^.* to indicate explicit anchoring. That saves PCRE from having to scan along the subject looking for a newline to restart at.
So, $parent = preg_replace('#^(.*)/.*#s', "$1", $uri);
may speed the first option. The second one would not need this setup:
s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.
Upvotes: 2