Reputation: 737
The issue: I'm coding a library which receives user supplied regex which contains unknown number of capture groups to be run against other input, and I want to extract value of all capture groups concatenated in one string (for further processing elsewhere).
It is trivial if number of capture groups is known in advance, as I just specify them:
#!/usr/bin/perl -w
my $input = `seq -s" " 100 200`;
my $user_regex =
qr/100(.*)103(.*)107(.*)109(.*)111(.*)113(.*)116(.*)120(.*)133(.*)140(.*)145/;
if ($input =~ $user_regex) { print "$1 $2 $3 $4 $5 $6 $7 $8 $9 $10\n"; }
correctly produces (ignore the extra whitespace):
101 102 104 105 106 108 110 112 114 115 117 118 119
121 122 123 124 125 126 127 128 129 130 131 132
134 135 136 137 138 139 141 142 143 144
However, if there are more than 10 capture groups I lose data if I don't modify the code. As the number of capture groups is unknown, currently I go with hundreds of manually specified matches ("$1" to "$200") under no warnings
pragma and hope it is enough, but it does not seem particularity clean or robust.
Ideally, I'd like something which works like values %+
does for named capture groups, but for non-named capture groups. Is it possible in perl 5.24? Or what less kludgy approach would you recommend for retrieving content of all numbered capture groups?
Upvotes: 3
Views: 3715
Reputation: 3777
Maybe you can just capture into an array?
my @captured = $input =~ $user_regexp;
if( @captured ) { print join " ", @captured; print "\n"; }
If you absolutely must use the numbered capture variables, use eval:
my $input = "abc";
my $re = qr/(.)(.)(.)/;
if( $input =~ $re){
my $num = 1;
print "captured \$$num = ". eval("\$$num") ."\n" and $num++
while eval "defined \$$num";
}
Or just:
my $input = "abc";
my $re = qr/(.)(.)(.)/;
if( $input =~ $re){
my $num = 1;
print "captured \$$num = $$num\n" and $num++ while defined $$num;
}
...but this last example with scalar references doesn't work under use strict
.
Upvotes: 7
Reputation: 479
The variables mentioned by Michael Carman and Borodin are helpfully documented together in perlvar - http://perldoc.perl.org/perlvar.html#Variables-related-to-regular-expressions.
That said I combined ideas from several of the postings into what I hope is a more comprehensive answer:
#!/usr/bin/env perl
use Modern::Perl;
my @a = 'abcde' =~ /(.).(.).(.)/;
say do { # map probably creates a temp anonymous array of capture strings
no strict 'refs';
join ' ', map { "$$_" } 1..$#-
};
say do { # no copy to array but eval
eval '"' . join(" ", map { "\$$_" } 1..$#-) . '"';
};
say "@a"; # still not clear from OP why this wasn't the answer
Upvotes: -1
Reputation: 30831
For v5.24 there's no array of all captured values, but you can extract them using the start/end location of each match:
my $s = <some string>;
my $re = <some regex with captures>;
my @matches;
if ($s =~ $re) {
for my $i (0 .. $#-) {
push @matches, substr($s, $-[$i], $+[$i] - $-[$i]);
}
}
Upvotes: 2
Reputation: 126722
If you are running Perl v5.26.2 (currently the most recent release) or later then you can use the built-in array @{^CAPTURE}
instead of accessing the capture variables themselves
Just like a normal array, the number of captures is scalar @{^CAPTURE}
, and the indexes are from zero to $#{^CAPTURE}
Note that the array is populated by the most recent successful pattern match, so just like the capture variables themselves you should check the status of a pattern match before using the contents of @{^CAPTURE}
Upvotes: 4
Reputation: 612
You can treat the numbers in $1 $2 etc as variables
$t="abcdefghijklmnop";
$t=~/(.)(.)(.)(.)(.)(.)(.)/;
print $$_ for 1..10;
you can bypass strict,
use strict;
$t="abcdefghijklmnop";
$t=~/(.)(.)(.)(.)(.)(.)(.)/;
{
no strict;
print $$_ for 1..10;
}
Or, you can put them in an array (taken from http://perldoc.perl.org/perlre.html)
use strict;
my $t="abcdefghijklmnop";
my @a=$t=~/(.)(.)(.)(.)(.)(.)(.)/;
print "@a";
although neither are perfect, using strict references means you have know know the names of your variables. Therefore, ideally you know you variable names e.g., how many capture groups you've used
Upvotes: -1