Reputation: 147
In Perl, I am trying to capture the words as tokens from the following example strings (there will always be at least one word):
"red" ==> $1 = 'red';
"red|white" ==> $1 = 'red'; $2 = 'white';
"red|white|blue" ==> $1 = 'red'; $2 = 'white'; $3 = 'blue';
etc.
The pattern I see here is: WORD, followed by n sets of "|WORD" [n >= 0]
So from that, I have:
/(\w+)((?:\|)(\w+)*)/
Which, to my understanding will always match the first WORD, and if a |WORD pair exists, capture that as many times as needed.
This doesn't work though, and I've tried several versions like:
/^(\w+)(\|(\w+))*$/
... what am I missing?
Upvotes: 1
Views: 626
Reputation: 183602
Your first regex is actually wrong — the *
is in the wrong place — but I'll focus on your second regex, which is correct:
/^(\w+)(\|(\w+))*$/
The problem is that this regex has three capture groups: (\w+)
, (\|(\w+))
, and (\w+)
. So it will populate, at most, three match variables: $1
, $2
, and $3
. Each match variable corresponds to a single corresponding capture group. Which is not what you want.
What you should do instead is use split
:
my @words = split /\|/, "red|white|blue";
# now $words[0] is 'red', $words[1] is 'white', $words[2] is 'blue'
Upvotes: 2