Matija Nalis
Matija Nalis

Reputation: 737

get value of all perl regex capture groups

The issue: I'm coding a library which receives user supplied regex which contains unknown number of capture groups to be run against other input, and I want to extract value of all capture groups concatenated in one string (for further processing elsewhere).

It is trivial if number of capture groups is known in advance, as I just specify them:

#!/usr/bin/perl -w
my $input = `seq -s" " 100 200`;
my $user_regex = 
 qr/100(.*)103(.*)107(.*)109(.*)111(.*)113(.*)116(.*)120(.*)133(.*)140(.*)145/;

if ($input =~ $user_regex)  { print "$1 $2 $3 $4 $5 $6 $7 $8 $9 $10\n"; }

correctly produces (ignore the extra whitespace):

 101 102   104 105 106   108   110   112   114 115   117 118 119 
 121 122 123 124 125 126 127 128 129 130 131 132   
 134 135 136 137 138 139   141 142 143 144

However, if there are more than 10 capture groups I lose data if I don't modify the code. As the number of capture groups is unknown, currently I go with hundreds of manually specified matches ("$1" to "$200") under no warnings pragma and hope it is enough, but it does not seem particularity clean or robust.

Ideally, I'd like something which works like values %+ does for named capture groups, but for non-named capture groups. Is it possible in perl 5.24? Or what less kludgy approach would you recommend for retrieving content of all numbered capture groups?

Upvotes: 3

Views: 3715

Answers (5)

Kjetil S.
Kjetil S.

Reputation: 3777

Maybe you can just capture into an array?

my @captured = $input =~ $user_regexp;
if( @captured ) { print join " ", @captured; print "\n"; }

If you absolutely must use the numbered capture variables, use eval:

my $input = "abc";
my $re = qr/(.)(.)(.)/;
if( $input =~ $re){
  my $num = 1;
  print "captured \$$num = ". eval("\$$num") ."\n" and $num++
    while eval "defined \$$num";
}

Or just:

my $input = "abc";
my $re = qr/(.)(.)(.)/;
if( $input =~ $re){
  my $num = 1;
  print "captured \$$num = $$num\n" and $num++ while defined $$num;
}

...but this last example with scalar references doesn't work under use strict.

Upvotes: 7

mr_ron
mr_ron

Reputation: 479

The variables mentioned by Michael Carman and Borodin are helpfully documented together in perlvar - http://perldoc.perl.org/perlvar.html#Variables-related-to-regular-expressions.

That said I combined ideas from several of the postings into what I hope is a more comprehensive answer:

#!/usr/bin/env perl

use Modern::Perl;

my @a = 'abcde' =~ /(.).(.).(.)/;

say do { # map probably creates a temp anonymous array of capture strings
    no strict 'refs';
    join ' ', map { "$$_" } 1..$#-
};

say do { # no copy to array but eval
    eval '"' . join(" ", map { "\$$_" } 1..$#-) . '"';
};

say "@a"; # still not clear from OP why this wasn't the answer

Upvotes: -1

Michael Carman
Michael Carman

Reputation: 30831

For v5.24 there's no array of all captured values, but you can extract them using the start/end location of each match:

my $s  = <some string>;
my $re = <some regex with captures>;
my @matches;
if ($s =~ $re) {
    for my $i (0 .. $#-) {
        push @matches, substr($s, $-[$i], $+[$i] - $-[$i]);
    }
}

Upvotes: 2

Borodin
Borodin

Reputation: 126722

If you are running Perl v5.26.2 (currently the most recent release) or later then you can use the built-in array @{^CAPTURE} instead of accessing the capture variables themselves

Just like a normal array, the number of captures is scalar @{^CAPTURE}, and the indexes are from zero to $#{^CAPTURE}

Note that the array is populated by the most recent successful pattern match, so just like the capture variables themselves you should check the status of a pattern match before using the contents of @{^CAPTURE}

Upvotes: 4

hoffmeister
hoffmeister

Reputation: 612

You can treat the numbers in $1 $2 etc as variables

$t="abcdefghijklmnop"; 
$t=~/(.)(.)(.)(.)(.)(.)(.)/; 
print $$_ for 1..10;

you can bypass strict,

  use strict;
  $t="abcdefghijklmnop"; 
  $t=~/(.)(.)(.)(.)(.)(.)(.)/; 
{
    no strict;
    print $$_ for 1..10;
}

Or, you can put them in an array (taken from http://perldoc.perl.org/perlre.html)

use strict; 
my $t="abcdefghijklmnop"; 
my @a=$t=~/(.)(.)(.)(.)(.)(.)(.)/; 
print "@a";

although neither are perfect, using strict references means you have know know the names of your variables. Therefore, ideally you know you variable names e.g., how many capture groups you've used

Upvotes: -1

Related Questions