Reputation: 101
I want a regular expression that matches something at the beginning of a line, and then matches (and returns) all other words. For instance, given this line:
$line = "one two three etc";
I want something like this (that doesn't work):
@matches= $line=~ /^one(?:\s+(\S+))$/;
to return into @matches, the words "two", "three", "etc".
I don't want to know how to get the words. I want to do it with a regular expression. It seems so simple, but I have not been able to come with a solution.
Upvotes: 10
Views: 7551
Reputation: 21
The (?{...}) "execute code" special grouping could be used to memorize necessary intermediate grouping captures
Let's start from your code:
#!/usr/bin/perl
$line = "one two three etc";
@matches = ();
$line=~ /^one(?:\s+(\S+)(?{push @matches, $1}))+$/;
print join "\n", @matches;
The @matches array will have "two", "three", "etc" inside. Because (?{push @matches, $1}) being executed after the partial match will store the captured value here.
The more complicated example which could clarify this approach better:
#!/usr/bin/perl
while(<>) { $a .= $_; }
$a =~ m{cipher-suites:\s*\[[\r\n" ]+(?:([^\]]*?)[\r\n", ]+(?{push @r, $1}))+\]}sm;
print join "\n", @r;
__END__
cipher-suites: [
"TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
"TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
"TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
"TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
"TLS_DHE_RSA_WITH_AES_256_GCM_SHA384",
"TLS_DHE_RSA_WITH_AES_128_GCM_SHA256"
]
This will extract ciphers into the array @r
The (?{...}) "execute code" is a very powerful regexp extension that could for instance extent regular expression with the possibility to match a nested parentheses expression.
Upvotes: 0
Reputation: 35208
The easiest solution is probably to split
after the fact:
use strict;
use warnings;
my $line = "one two three etc";
my @matches = $line =~ /^one\s+(.*)/ ? split(' ', $1) : ();
use Data::Dump;
dd @matches;
Outputs:
("two", "three", "etc")
However, it's also possible to use \G
to continue from where the previous match left off and therefore find all the non-spaces using the /g
modifier.
The only trick is remember to not let the \G
match at the beginning of the string so the word one
must match:
my @matches = $line =~ /(?:^one|(?<!\A)\G)\s+(\S+)/g;
Upvotes: -1
Reputation: 20486
You cannot have an unknown number of capture groups. If you try to repeat a capturing group, the last instance will override the contents of the capture group:
^one(?:\s+(\S+))+$
etc
Or:
^one\s+(\S+)\s+(\S+)\s+(\S+)$
two
three
etc
I suggest either capturing the entire group and then splitting by spaces:
^one\s+((?:\S+\s*)+)$
two three etc
Or you can do a global match and utilize \G
and \K
:
(?:^one|(?<!\A)\G).*?\K\S+
two
three
etc
Upvotes: 4
Reputation: 89639
To do that you need to use the \G
anchor that matches the position at the end of the last match. When you build a pattern with this anchor, you can obtain contiguous results:
@matches = $line =~ /(?:\G(?!\A)|^one) (\S+)/g;
Upvotes: 5