Jesse Walton
Jesse Walton

Reputation: 147

Regex: Capturing repeating group of groups (Perl)

In Perl, I am trying to capture the words as tokens from the following example strings (there will always be at least one word):

"red"               ==>    $1 = 'red';
"red|white"         ==>    $1 = 'red'; $2 = 'white';
"red|white|blue"    ==>    $1 = 'red'; $2 = 'white'; $3 = 'blue';
etc.

The pattern I see here is: WORD, followed by n sets of "|WORD" [n >= 0]

So from that, I have:

/(\w+)((?:\|)(\w+)*)/

Which, to my understanding will always match the first WORD, and if a |WORD pair exists, capture that as many times as needed.

This doesn't work though, and I've tried several versions like:

/^(\w+)(\|(\w+))*$/

... what am I missing?

Upvotes: 1

Views: 626

Answers (1)

ruakh
ruakh

Reputation: 183602

Your first regex is actually wrong — the * is in the wrong place — but I'll focus on your second regex, which is correct:

/^(\w+)(\|(\w+))*$/

The problem is that this regex has three capture groups: (\w+), (\|(\w+)), and (\w+). So it will populate, at most, three match variables: $1, $2, and $3. Each match variable corresponds to a single corresponding capture group. Which is not what you want.

What you should do instead is use split:

my @words = split /\|/, "red|white|blue";

# now $words[0] is 'red', $words[1] is 'white', $words[2] is 'blue'

Upvotes: 2

Related Questions