Reputation: 4325
I know this may be a common question and a duplicate, but I don't know how to express it well. For example, using Perl,
@arr = "a bb ccc" =~ /\b(\w+)\b/g;
can successfully get the three words.
But if I add one condition, that the line must start with a specific word, such as begin(excluding in the result array)
@arr = "begin:a bb ccc" =~ /begin:.*\b(\w+)\b/g;
This time the array contains only the last match ccc
How should I write a correct regex?
Upvotes: 3
Views: 1188
Reputation: 18490
Seems that you're looking for contiguous matching.
The \G assertion can be used to chain global matches
@arr = ("begin: a bb ccc" =~ /(?:^begin:|\G)\h*(\w+)\b/g);
See demo at eval.in
(?:^begin:|\G(?!^))
This part is to bind the matches to begin:
at ^
start.
\G
matches at the end of a previous match. Without (?!^)
\G
would also match at start.
\h*(\w+)\b
matches *
any amount of \h
horizontal whitespace followed by group (\w+)
to capture one or more word characters if followed by a \b
word boundary to $1
.
Instead of \h+
use [^\w\n]+
to match any characters that are not word-characters or newline in between. To match begin:
anywhere in the string, remove the ^
start anchor.
Also see demo at regex101
\G
is especially useful to match an "anchored" sequence and extract each subsequent match.
Upvotes: 1
Reputation: 126722
You don't say very much about your data, especially what happens if there is no begin
at the start of the line. But you probably want split
instead of a regex pattern
Something like this
use strict;
use warnings 'all';
use feature 'say';
my $s = 'begin:a bb ccc';
my @arr;
if ( $s =~ /\Abegin:(.*)/ ) {
@arr = split ' ', $1;
}
else {
say 'No "begin"';
}
say join ',', @arr;
a,bb,ccc
Upvotes: 0