Evan Carroll
Evan Carroll

Reputation: 1

Why does the match operator's "match-only-once" optimization only apply with the "?" delimiter?

From the docs (perldoc -f m)

If ? is the delimiter, then a match-only-once rule applies, described in m?*PATTERN*? below.

The "match-only-once rule" doesn't' seem to be defined anywhere, but it seems to be a real optimization,

use Benchmark qw(:all) ;
use constant HAYSTACK => "this is a test string";
my $needle = "test";

cmpthese(-1, {
    'questionmark'  => sub { if ( HAYSTACK =~ m?$needle?n ) { 1 } },
    'backslash'     => sub { if ( HAYSTACK =~ m/$needle/n ) { 1 } },
});

With the results,

                   Rate    backslash questionmark
backslash     9267717/s           --         -57%
questionmark 21588328/s         133%           --

This makes me wonder why is the behavior in m// in scalar context such that it even needs this behavior? Let's take for example the output

perl -E'say "FOOOOOO" =~ m/O/' # returns 1

If it's not even counting the O what does it do after the first match such that it's twice as slow?

Upvotes: 3

Views: 106

Answers (2)

Evan Carroll
Evan Carroll

Reputation: 1

The confusion here is that "once" in "match-only-once" is in reference to the calling context of the m?? not in reference to matching once the needle inside the haystack, and ignoring subsequent matches of the needle inside the haystack. So if m?? is called many times without reset, only the first one that matches will return the match.

sub foo { return "foo" =~ m?o? };

say foo(); # 1
say foo(); # undef
reset();
say foo(); # 1

Upvotes: 0

ruakh
ruakh

Reputation: 183446

The "match-only-once rule" doesn't' seem to be defined anywhere, […]

"A match-only-once rule" is a description of the rule — it's a rule saying that m?PATTERN? matches only once — not an official name that you can use to search. The text that you quote is pulled from the perlop manpage, so when it says "described in m?*PATTERN*? below", it's referring to this part of that manpage:

m?PATTERN?msixpodualngc

This is just like the m/PATTERN/ search, except that it matches only once between calls to the reset() operator. This is a useful optimization when you want to see only the first occurrence of something in each file of a set of files, for instance. Only m?? patterns local to the current package are reset.

while (<>) {
    if (m?^$?) {
        # blank line between header and body
     }
} continue {
    reset if eof;    # clear m?? status for next file
}

Another example switched the first "latin1" encoding it finds to "utf8" in a pod file:

s//utf8/ if m? ^ =encoding \h+ \K latin1 ?x;

This makes me wonder why is the behavior in m// in scalar context such that it even needs this behavior?

Even in scalar context, m// or m?? may be called many times between resets, and if so then the two behave differently. (You can see this in the first snippet above. It's also the reason that your benchmarks give different performance results: the version with m?$needle?n only does a regex match the first time the function is called — it just returns 'no match' on all subsequent calls — whereas the version with m/$needle/n does a regex match every time.)

Upvotes: 7

Related Questions