Oleksiy
Oleksiy

Reputation: 91

Repeating qualifier is ignored in regex

I got stuck on building a regex with repeating qualifier. No luck finding a piece of advise online.

Here is a string to match -

abc cde fgi

The regex is

^(?:(.*?)(abc|fgi)){2}(.*)$

Here is example output from redemo.py:

the way how the regex matches the string

A similar output I am getting from the Perl:

perl --version | head -2; perl -MData::Dumper -e 'print Dumper ["abc cde fgi" =~ /^(?:(.*?)(abc|fgi)){2}(.*)$/g]'

This is perl 5, version 14, subversion 4 (v5.14.4) built for cygwin-thread-multi
$VAR1 = [
          ' cde ',
          'fgi',
          ''
        ];

I would better leave behind the scope the reason, why I have to apply exactly this regex.

But here is the problem: I would expect '{2}' qualifier is a strict requirement for matching the string, thus the interpreter will return 5 groups for successful match -

1: ''
2: 'abc'
3: ' '
4: 'fgi'
5: ''

Unexpectedly, the interpreter is fine with returning only 3 groups, it looks like '{2}' qualifier is being ignored.

Could someone comment if my understanding of regex repeating qualifiers is wrong?

Can anyone advise a tool to visualize how regex is being interpreted step-by-step?

Thanks,

Upvotes: 1

Views: 374

Answers (1)

Kilian Foth
Kilian Foth

Reputation: 14346

You've included the {2} in your regex, but you haven't included it in the matching group. That means that the repeat condition is evaluated when calculating whether there is a match, but it isn't evaluated when computing the groups that contributed to the match. Instead, you get one repetition of the group that matched twice. To get both repetitions assigned to the content of one group, include the {2} within the () for that group.

Note that you never get more groups assigned than there are pairs of literal () in the input. To get the individual repetitions of a group, you have to code a loop and repeat the match (or in Perl, include code in the regex via its e flag).

Upvotes: 1

Related Questions