Log
Log

Reputation: 483

Why one word breaks all right output in regex (perl)?

I want to understand the situation with regular expression in Perl.

$str = "123-abc 23-rr";

Need to show both words beside minus. Regular expression is:

@mas=$str=~/(?:([\d\w]+)\-([\d\w]+))/gx;

And it show right output: 123, abc, 23, rr. But if I change string a little and put one word in start:

$str = "word 123-abc 23-rr";

And I want to take account this first word, so I change my regexp:

@mas=$str=~/\w+\s(?:\s*([\d\w]+)\-([\d\w]+))*/gx;

My output must be same, but there are: 23, rr. If I remove \s* or * the output is 123, abc. But it's still not right. Anyone knows why?

Upvotes: 0

Views: 118

Answers (2)

Schwern
Schwern

Reputation: 164769

Rather than making an ever more specific regex for an ever more specific string, consider taking advantage of the overall pattern.

  1. Each piece is separated by whitespace.
  2. The first piece is a word.
  3. The rest are pairs separated by dashes.

First split the pieces on whitespace.

my @pieces = split /\s+/, $str;

Then remove the first piece, it doesn't have to be split.

my $word = shift @pieces;

Then split each piece on - into pairs.

my %pairs = map { split /-/, $_ } @words;

Upvotes: 1

ikegami
ikegami

Reputation: 385657

For each match, each capture is returned.


In the first snippet, the pattern matches twice.

123-abc 23-rr
\_____/ \___/

There are two captures, so four (2*2=4) values are returned.


In the second snippet, the pattern matches once.

word 123-abc 23-rr
\________________/

There are two captures, so two (2*1=2) values are returned.

Upvotes: 1

Related Questions