robut
robut

Reputation: 351

Perl Regex Unexpected Performance Hit

Perl v5.28.1

The benchmark :

use common::sense;
use Benchmark qw(:all);

my $UPPER = 10_000_000;
my $str = 'foo bar baz';

cmpthese(10, {
        'empty for-loop' => sub {
                        for my $i (1..$UPPER) {}
                },
        'regex match' => sub {
                        for my $i (1..$UPPER) {
                                $str =~ /foo/;
                        }
                },
        'regex match (single compile)' => sub {
                        my $re = qr/foo/;
                        for my $i (1..$UPPER) {
                                $str =~ $re;
                        }
                },
        'regex match (anchor)' => sub {
                        for my $i (1..$UPPER) {
                                $str =~ /^foo/;
                        }
                },
        'regex match (anchor) (single compile)' => sub {
                        my $re = qr/^foo/;
                        for my $i (1..$UPPER) {
                                $str =~ $re;
                        }
                },
});

The results :

                                      s/iter regex match (anchor) (single compile) regex match (single compile) regex match (anchor) regex match empty for-loop
regex match (anchor) (single compile)   3.83                                    --                         -21%                 -60%        -84%           -97%
regex match (single compile)            3.04                                   26%                           --                 -50%        -80%           -96%
regex match (anchor)                    1.53                                  151%                          99%                   --        -61%           -92%
regex match                            0.601                                  537%                         405%                 154%          --           -81%
empty for-loop                         0.117                                 3170%                        2496%                1205%        414%             --

Because foo happens to occur at the start of the string, I would expect adding an explicit anchor (^) to the regex to do nothing ... not halve performance !

As well, I've read something to the effect that Perl is smart enough to not recompile expressions with fixed strings, even when contained within loops.
But why would attempting to manually/explicitly "precompile" an expression into variable $re cause such a performance hit ?!

I changed the search substring "foo" to "asdf" (which does not occur in $str), and anchoring does let the engine drop out of searching sooner. But assigning the expression into a variable is still a massive performance hit - much more than I would have expected ! :

                                         Rate regex match (single compile) regex match (anchor) (single compile) regex match regex match (anchor) empty for-loop
regex match (single compile)          0.401/s                           --                                  -10%        -79%                 -83%           -96%
regex match (anchor) (single compile) 0.447/s                          11%                                    --        -76%                 -81%           -95%
regex match                            1.88/s                         369%                                  321%          --                 -19%           -79%
regex match (anchor)                   2.33/s                         481%                                  421%         24%                   --           -75%
empty for-loop                         9.17/s                        2185%                                 1951%        387%                 294%             --

So 2 questions to summarize :
- Why should a start-of-string anchor halve performance ?
- Why should compiling an expression (qr//) into a variable be 80% slower than using the same expression in-line ?

Upvotes: 3

Views: 206

Answers (1)

Dave Mitchell
Dave Mitchell

Reputation: 2403

Adding the anchor was preventing a particular regex optimisation from occurring. This has been fixed in 5.30.0.

Using a qr// object currently incurs a slight penalty since internally part of the regex structure has to be copied (related to to the fact that each regex object has its own set of capture indices). Noone's thought of a good fix for this yet.

Upvotes: 5

Related Questions