Reputation: 23633
Is there a way to find all possible start positions for a regex match in perl?
For example, if your regex was "aa" and the text was "aaaa", it would return 0, 1, and 2, instead of, say 0 and 2.
Obviously, you could just do something like return the first match, and then delete all characters up to and including that starting character, and perform another search, but I'm hoping for something more efficient.
Upvotes: 4
Views: 1742
Reputation: 40142
Update:
I thought about this one a bit more, and came up with this solution using an embedded code block, which is nearly three times faster than the grep
solution:
use 5.010;
use warnings;
use strict;
{my @pos;
my $push_pos = qr/(?{push @pos, $-[0]})/;
sub with_code {
my ($re, $str) = @_;
@pos = ();
$str =~ /(?:$re)$push_pos(?!)/;
@pos
}}
and for comparison:
sub with_grep { # old solution
my ($re, $str) = @_;
grep {pos($str) = $_; $str =~ /\G(?:$re)/} 0 .. length($str) - 1;
}
sub with_while { # per Michael Carman's solution, corrected
my ($re, $str) = @_;
my @pos;
while ($str =~ /\G.*?($re)/) {
push @pos, $-[1];
pos $str = $-[1] + 1
}
@pos
}
sub with_look_ahead { # a fragile "generic" version of Sean's solution
my ($re, $str) = @_;
my ($re_a, $re_b) = split //, $re, 2;
my @pos;
push @pos, $-[0] while $str =~ /$re_a(?=$re_b)/g;
@pos
}
Benchmarked and sanity checked with:
use Benchmark 'cmpthese';
my @arg = qw(aa aaaabbbbbbbaaabbbbbaaa);
my $expect = 7;
for my $sub qw(grep while code look_ahead) {
no strict 'refs';
my @got = &{"with_$sub"}(@arg);
"@got" eq '0 1 2 11 12 19 20' or die "$sub: @got";
}
cmpthese -2 => {
grep => sub {with_grep (@arg) == $expect or die},
while => sub {with_while (@arg) == $expect or die},
code => sub {with_code (@arg) == $expect or die},
ahead => sub {with_look_ahead(@arg) == $expect or die},
};
Which prints:
Rate grep while ahead code
grep 49337/s -- -20% -43% -65%
while 61293/s 24% -- -29% -56%
ahead 86340/s 75% 41% -- -38%
code 139161/s 182% 127% 61% --
Upvotes: 2
Reputation: 20280
I know you asked for a regex, but there is actually a simple builtin function that does something quite similar, the function index
(perldoc -f index
). From that we can build up a simple solution to your direct question, though if you really need a more complicated search than your example this will not work as it only looks for substrings (after an index given by the third parameter).
#!/usr/bin/env perl
use strict;
use warnings;
my $str = 'aaaa';
my $substr = 'aa';
my $pos = -1;
while (1) {
$pos = index($str, $substr, $pos + 1);
last if $pos < 0;
print $pos . "\n";
}
Upvotes: 1
Reputation: 149736
You can use global matching with the pos()
function:
my $s1 = "aaaa";
my $s2 = "aa";
while ($s1 =~ /aa/g) {
print pos($s1) - length($s2), "\n";
}
Upvotes: 0
Reputation: 29772
Use lookahead:
$ perl -le 'print $-[0] while "aaaa" =~ /a(?=a)/g'
In general, put everything except the first character of the regex inside of the (?=...)
.
Upvotes: 6