Reputation: 351
Perl v5.28.1
The benchmark :
use common::sense;
use Benchmark qw(:all);
my $UPPER = 10_000_000;
my $str = 'foo bar baz';
cmpthese(10, {
'empty for-loop' => sub {
for my $i (1..$UPPER) {}
},
'regex match' => sub {
for my $i (1..$UPPER) {
$str =~ /foo/;
}
},
'regex match (single compile)' => sub {
my $re = qr/foo/;
for my $i (1..$UPPER) {
$str =~ $re;
}
},
'regex match (anchor)' => sub {
for my $i (1..$UPPER) {
$str =~ /^foo/;
}
},
'regex match (anchor) (single compile)' => sub {
my $re = qr/^foo/;
for my $i (1..$UPPER) {
$str =~ $re;
}
},
});
The results :
s/iter regex match (anchor) (single compile) regex match (single compile) regex match (anchor) regex match empty for-loop
regex match (anchor) (single compile) 3.83 -- -21% -60% -84% -97%
regex match (single compile) 3.04 26% -- -50% -80% -96%
regex match (anchor) 1.53 151% 99% -- -61% -92%
regex match 0.601 537% 405% 154% -- -81%
empty for-loop 0.117 3170% 2496% 1205% 414% --
Because foo happens to occur at the start of the string, I would expect adding an explicit anchor (^) to the regex to do nothing ... not halve performance !
As well, I've read something to the effect that Perl is smart enough to not recompile expressions with fixed strings, even when contained within loops.
But why would attempting to manually/explicitly "precompile" an expression into variable $re cause such a performance hit ?!
I changed the search substring "foo" to "asdf" (which does not occur in $str), and anchoring does let the engine drop out of searching sooner. But assigning the expression into a variable is still a massive performance hit - much more than I would have expected ! :
Rate regex match (single compile) regex match (anchor) (single compile) regex match regex match (anchor) empty for-loop
regex match (single compile) 0.401/s -- -10% -79% -83% -96%
regex match (anchor) (single compile) 0.447/s 11% -- -76% -81% -95%
regex match 1.88/s 369% 321% -- -19% -79%
regex match (anchor) 2.33/s 481% 421% 24% -- -75%
empty for-loop 9.17/s 2185% 1951% 387% 294% --
So 2 questions to summarize :
- Why should a start-of-string anchor halve performance ?
- Why should compiling an expression (qr//) into a variable be 80% slower than using the same expression in-line ?
Upvotes: 3
Views: 206
Reputation: 2403
Adding the anchor was preventing a particular regex optimisation from occurring. This has been fixed in 5.30.0.
Using a qr// object currently incurs a slight penalty since internally part of the regex structure has to be copied (related to to the fact that each regex object has its own set of capture indices). Noone's thought of a good fix for this yet.
Upvotes: 5