Reputation: 101

perl regex to capture repeating group

I want a regular expression that matches something at the beginning of a line, and then matches (and returns) all other words. For instance, given this line:

$line = "one two three etc";

I want something like this (that doesn't work):

@matches= $line=~ /^one(?:\s+(\S+))$/;

to return into @matches, the words "two", "three", "etc".

I don't want to know how to get the words. I want to do it with a regular expression. It seems so simple, but I have not been able to come with a solution.

Upvotes: 10

Answers (5)

Marvin Nimnull

Reputation: 21

The (?{...}) "execute code" special grouping could be used to memorize necessary intermediate grouping captures

Let's start from your code:

#!/usr/bin/perl

$line = "one two three etc";
@matches = ();
$line=~ /^one(?:\s+(\S+)(?{push @matches, $1}))+$/;
print join "\n", @matches;

The @matches array will have "two", "three", "etc" inside. Because (?{push @matches, $1}) being executed after the partial match will store the captured value here.

The more complicated example which could clarify this approach better:

#!/usr/bin/perl

while(<>) { $a .= $_; }
$a =~ m{cipher-suites:\s*\[[\r\n" ]+(?:([^\]]*?)[\r\n", ]+(?{push @r, $1}))+\]}sm;

print join "\n", @r;
__END__
cipher-suites: [
  "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
  "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
  "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
  "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
  "TLS_DHE_RSA_WITH_AES_256_GCM_SHA384",
  "TLS_DHE_RSA_WITH_AES_128_GCM_SHA256"
]

This will extract ciphers into the array @r

The (?{...}) "execute code" is a very powerful regexp extension that could for instance extent regular expression with the possibility to match a nested parentheses expression.

Upvotes: 0

vks

Reputation: 67988

^.*?\s\K|(\w+)

Try this.See demo.

http://regex101.com/r/lS5tT3/2

Upvotes: 4

Miller

Reputation: 35208

The easiest solution is probably to split after the fact:

use strict;
use warnings;

my $line = "one two three etc";

my @matches = $line =~ /^one\s+(.*)/ ? split(' ', $1) : ();

use Data::Dump;
dd @matches;

Outputs:

("two", "three", "etc")

However, it's also possible to use \G to continue from where the previous match left off and therefore find all the non-spaces using the /g modifier.

The only trick is remember to not let the \G match at the beginning of the string so the word one must match:

my @matches = $line =~ /(?:^one|(?<!\A)\G)\s+(\S+)/g;

Upvotes: -1

Sam

Reputation: 20486

You cannot have an unknown number of capture groups. If you try to repeat a capturing group, the last instance will override the contents of the capture group:

Expression: ^one(?:\s+(\S+))+$
Capture #1: etc

Or:

Expression: ^one\s+(\S+)\s+(\S+)\s+(\S+)$
Capture #1: two
Capture #2: three
Capture #3: etc

I suggest either capturing the entire group and then splitting by spaces:

Expression: ^one\s+((?:\S+\s*)+)$
Capture #1: two three etc

Or you can do a global match and utilize \G and \K:

Expression: (?:^one|(?<!\A)\G).*?\K\S+
Match #1: two
Match #2: three
Match #3: etc

Upvotes: 4

Casimir et Hippolyte

Reputation: 89639

To do that you need to use the \G anchor that matches the position at the end of the last match. When you build a pattern with this anchor, you can obtain contiguous results:

@matches = $line =~ /(?:\G(?!\A)|^one) (\S+)/g;

Upvotes: 5

perl regex to capture repeating group

Answers (5)

Related Questions