Lei Yang
Lei Yang

Reputation: 4325

Regex to return all words in a line that starts with specific word

I know this may be a common question and a duplicate, but I don't know how to express it well. For example, using Perl,

@arr = "a bb ccc" =~ /\b(\w+)\b/g;

can successfully get the three words.

But if I add one condition, that the line must start with a specific word, such as begin(excluding in the result array)

@arr = "begin:a bb ccc" =~ /begin:.*\b(\w+)\b/g;

This time the array contains only the last match ccc

How should I write a correct regex?

Upvotes: 3

Views: 1188

Answers (2)

bobble bubble
bobble bubble

Reputation: 18490

Seems that you're looking for contiguous matching.

The \G assertion can be used to chain global matches

@arr = ("begin: a bb ccc" =~ /(?:^begin:|\G)\h*(\w+)\b/g);

See demo at eval.in

  • (?:^begin:|\G(?!^)) This part is to bind the matches to begin: at ^ start.
    \G matches at the end of a previous match. Without (?!^) \G would also match at start.

  • \h*(\w+)\b matches * any amount of \h horizontal whitespace followed by group (\w+) to capture one or more word characters if followed by a \b word boundary to $1.

  • Instead of \h+ use [^\w\n]+ to match any characters that are not word-characters or newline in between. To match begin: anywhere in the string, remove the ^ start anchor.

Also see demo at regex101

\G is especially useful to match an "anchored" sequence and extract each subsequent match.

Upvotes: 1

Borodin
Borodin

Reputation: 126722

You don't say very much about your data, especially what happens if there is no begin at the start of the line. But you probably want split instead of a regex pattern

Something like this

use strict;
use warnings 'all';
use feature 'say';

my $s = 'begin:a bb ccc';

my @arr;

if ( $s =~ /\Abegin:(.*)/ ) {
    @arr = split ' ', $1;
}
else {
    say 'No "begin"';
}

say join ',', @arr;

output

a,bb,ccc

Upvotes: 0

Related Questions