Reputation: 129481
I've been poring over perldoc perlre
as well as the Regular Expressions Cookbook and related questions on Stack Overflow and I can't seem to find what appears to be a very useful expression: how do I know the number of current match?
There are expressions for the last closed group match ($^N
), contents of match 3 (\g{3}
if I understood the docs correctly), $'
, $&
and $`
. But there doesn't seem to be a variable I can use that simply tells me what the number of the current match is.
Is it really missing? If so, is there any explained technical reason why it is a hard thing to implement, or am I just not reading the perldoc carefully enough?
Please note that I'm interested in a built-in variable, NOT workarounds like using (${$count++})
.
For context, I'm trying to build a regular expression that would match only some instances of a match (e.g. match all occurrences of character "E" but do NOT match occurrences 3, 7 and 10 where 3, 7 and 10 are simply numbers in an array). I ran into this when trying to construct a more idiomatic answer to this SO question.
I want to avoid evaluating regexes as strings to actually insert 3, 7 and 10 into the regex itself.
Upvotes: 12
Views: 387
Reputation: 132865
I played around with this for a bit. Again, I know that this is not really what you are looking for, but I don't think that exists in the way you want it.
I had two thoughts. First, with a split using separator retention mode, you get the interstitial bits as the odd numbered elements in the output list. With the list from the split, you count which match you are on and put it back together how you like:
use v5.14;
$_ = 'ab1cdef2gh3ij4k5lmn6op7qr8stu9vw10xyz';
my @bits = split /(\d+)/; # separator retention mode
my @skips = qw(3 7 10);
my $s;
while( my( $index, $value ) = each @bits ) {
# shift indices to match number ( index = 2 n - 1 )
if( $index % 2 and ! ( ( $index + 1 )/2 ~~ @skips ) ) {
$s .= '^';
}
else {
$s .= $value;
}
}
I get:
ab^cdef^gh3ij^k^lmn^op7qr^stu^vw10xyz
I thought I really liked my split answer until I had the second thought. Does state work inside a substitution? It appears that it does:
use v5.14;
$_ = 'ab1cdef2gh3ij4k5lmn6op7qr8stu9vw10xyz';
my @skips = qw(3 7 10);
s/(\d+)/
state $n = 0;
$n++;
$n ~~ @skips ? $1 : '$'
/eg;
say;
This gives me:
ab$cdef$gh3ij$k$lmn$op7qr$stu$vw10xyz
I don't think you can get much simpler than that, even if that magic variable existed.
I had a third thought which I didn't try. I wonder if state works inside a code assertion. It might, but then I'd have to figure out how to use one of those to make a match fail, which really means it has to skip over the bit that might have matched. That seems really complicated, which is probably what Borodin was pressuring you to show even in pseudocode.
Upvotes: 5
Reputation: 132865
I'm completely ignoring the actually utility or wisdom of using this for the other question.
I thought @-
or @+
might do what you want since they hold the offsets of the numbered matches, but it looks like the regex engine already knows what the last index will be:
use v5.14;
use Data::Printer;
$_ = 'abc123abc345abc765abc987abc123';
my @matches = m/
([0-9]+)
(?{
print 'Matched \$' . $#+ . " group with $^N\n";
say p(@+);
})
.*?
([0-9]+)
(?{
print 'Matched \$' . $#+ . " group with $^N\n";
say p(@+);
})
/x;
say "Matches: @matches";
This gives strings that show the last index as 2 even though it hasn't matched $2
yet.
Matched \$2 group with 123
[
[0] 6,
[1] 6,
[2] undef
]
Matched \$2 group with 345
[
[0] 12,
[1] 6,
[2] 12
]
Matches: 123 345
Notice that the first time around, $+[2]
is undef, so that one hasn't been filled in yet. You might be able to do something with that, but I think that's probably getting away from the spirit of your question. If you were really fancy, you could create a tied scalar that has the value of the last defined index in @+
, I guess.
Upvotes: 6