user557597
user557597

Reputation:

Need confirmation of capture group numbering when doing Nested Branch Reset's

When doing nested branch reset's, I've made my best guess as to how it probably works. I've searched the internet for nested information, and couldn't find a definative confirmation.

Mostly the concern is what happens to the sequence immediatly folowing an inner nesting.
The sample below is my best guess, if anybody could confirm its correct or steer me in the right direction it would be appretiated.

Sample regex:

(a)(?|x(y)z(?|(u)(u)(u)(u)(u)(u)|(e)(e)(e)|(c))(K)|(p(q(?|(M)(M)(M)(M)(?|(T)(T)(T)|(D)(D))(R)(R)|(B)(B)(B)|(v)))r)(o)(i)|(t)s(w))(Z)

Number Sequenced regex:

1    ( a )
     (?|
          x
2         ( y )
          z
          (?|
3              ( u )
4              ( u )
5              ( u )
6              ( u )
7              ( u )
8              ( u )
            |  
3              ( e )
4              ( e )
5              ( e )
            |  
3              ( c )
          )
9         ( K )
       |  
2         (
               p
  3            (
                    q
                    (?|
    4                    ( M )
    5                    ( M )
    6                    ( M )
    7                    ( M )
                         (?|
    8                         ( T )
    9                         ( T )
    10                        ( T )
                           |  
    8                         ( D )
    9                         ( D )
                         )
    11                   ( R )
    12                   ( R )
                      |  
    4                    ( B )
    5                    ( B )
    6                    ( B )
                      |  
    4                    ( v )
                    )
  3            )
               r
2         )
13        ( o )
14        ( i )
       |  
2         ( t )
          s
3         ( w )
     )
15   ( Z )

Perl Test Case:

Formatted:

 # (a)(?|x(y)z(?|(u)(u)(u)(u)(u)(u)|(e)(e)(e)|(c))(K)|(p(q(?|(M)(M)(M)(M)(?|(T)(T)(T)|(D)(D))(R)(R)|(B)(B)(B)|(v)))r)(o)(i)|(t)s(w))(Z)

 ( a )                         # (1)
 (?|
      x
      ( y )                         # (2)
      z
      (?|
           ( u )                         # (3)
           ( u )                         # (4)
           ( u )                         # (5)
           ( u )                         # (6)
           ( u )                         # (7)
           ( u )                         # (8)
        |  
           ( e )                         # (3)
           ( e )                         # (4)
           ( e )                         # (5)
        |  
           ( c )                         # (3)
      )
      ( K )                         # (9)
   |  
      (                             # (2 start)
           p
           (                             # (3 start)
                q
                (?|
                     ( M )                         # (4)
                     ( M )                         # (5)
                     ( M )                         # (6)
                     ( M )                         # (7)
                     (?|
                          ( T )                         # (8)
                          ( T )                         # (9)
                          ( T )                         # (10)
                       |  
                          ( D )                         # (8)
                          ( D )                         # (9)
                     )
                     ( R )                         # (11)
                     ( R )                         # (12)
                  |  
                     ( B )                         # (4)
                     ( B )                         # (5)
                     ( B )                         # (6)
                  |  
                     ( v )                         # (4)
                )
           )                             # (3 end)
           r
      )                             # (2 end)
      ( o )                         # (13)
      ( i )                         # (14)
   |  
      ( t )                         # (2)
      s
      ( w )                         # (3)
 )
 ( Z )                         # (15)

Perl engine results:
Input

axyzuuuuuuKZ
axyzeeeKZ
axyzcKZ
apqMMMMTTTRRroiZ
apqMMMMDDRRroiZ
apqBBBroiZ
apqvroiZ
atswZ

Output

 **  Grp 0 -  ( pos 0 , len 12 ) 
axyzuuuuuuKZ  
 **  Grp 1 -  ( pos 0 , len 1 ) 
a  
 **  Grp 2 -  ( pos 2 , len 1 ) 
y  
 **  Grp 3 -  ( pos 4 , len 1 ) 
u  
 **  Grp 4 -  ( pos 5 , len 1 ) 
u  
 **  Grp 5 -  ( pos 6 , len 1 ) 
u  
 **  Grp 6 -  ( pos 7 , len 1 ) 
u  
 **  Grp 7 -  ( pos 8 , len 1 ) 
u  
 **  Grp 8 -  ( pos 9 , len 1 ) 
u  
 **  Grp 9 -  ( pos 10 , len 1 ) 
K  
 **  Grp 10 -  NULL 
 **  Grp 11 -  NULL 
 **  Grp 12 -  NULL 
 **  Grp 13 -  NULL 
 **  Grp 14 -  NULL 
 **  Grp 15 -  ( pos 11 , len 1 ) 
Z  

-----------------------

 **  Grp 0 -  ( pos 14 , len 9 ) 
axyzeeeKZ  
 **  Grp 1 -  ( pos 14 , len 1 ) 
a  
 **  Grp 2 -  ( pos 16 , len 1 ) 
y  
 **  Grp 3 -  ( pos 18 , len 1 ) 
e  
 **  Grp 4 -  ( pos 19 , len 1 ) 
e  
 **  Grp 5 -  ( pos 20 , len 1 ) 
e  
 **  Grp 6 -  NULL 
 **  Grp 7 -  NULL 
 **  Grp 8 -  NULL 
 **  Grp 9 -  ( pos 21 , len 1 ) 
K  
 **  Grp 10 -  NULL 
 **  Grp 11 -  NULL 
 **  Grp 12 -  NULL 
 **  Grp 13 -  NULL 
 **  Grp 14 -  NULL 
 **  Grp 15 -  ( pos 22 , len 1 ) 
Z  

-----------------------

 **  Grp 0 -  ( pos 25 , len 7 ) 
axyzcKZ  
 **  Grp 1 -  ( pos 25 , len 1 ) 
a  
 **  Grp 2 -  ( pos 27 , len 1 ) 
y  
 **  Grp 3 -  ( pos 29 , len 1 ) 
c  
 **  Grp 4 -  NULL 
 **  Grp 5 -  NULL 
 **  Grp 6 -  NULL 
 **  Grp 7 -  NULL 
 **  Grp 8 -  NULL 
 **  Grp 9 -  ( pos 30 , len 1 ) 
K  
 **  Grp 10 -  NULL 
 **  Grp 11 -  NULL 
 **  Grp 12 -  NULL 
 **  Grp 13 -  NULL 
 **  Grp 14 -  NULL 
 **  Grp 15 -  ( pos 31 , len 1 ) 
Z  

-----------------------

 **  Grp 0 -  ( pos 34 , len 16 ) 
apqMMMMTTTRRroiZ  
 **  Grp 1 -  ( pos 34 , len 1 ) 
a  
 **  Grp 2 -  ( pos 35 , len 12 ) 
pqMMMMTTTRRr  
 **  Grp 3 -  ( pos 36 , len 10 ) 
qMMMMTTTRR  
 **  Grp 4 -  ( pos 37 , len 1 ) 
M  
 **  Grp 5 -  ( pos 38 , len 1 ) 
M  
 **  Grp 6 -  ( pos 39 , len 1 ) 
M  
 **  Grp 7 -  ( pos 40 , len 1 ) 
M  
 **  Grp 8 -  ( pos 41 , len 1 ) 
T  
 **  Grp 9 -  ( pos 42 , len 1 ) 
T  
 **  Grp 10 -  ( pos 43 , len 1 ) 
T  
 **  Grp 11 -  ( pos 44 , len 1 ) 
R  
 **  Grp 12 -  ( pos 45 , len 1 ) 
R  
 **  Grp 13 -  ( pos 47 , len 1 ) 
o  
 **  Grp 14 -  ( pos 48 , len 1 ) 
i  
 **  Grp 15 -  ( pos 49 , len 1 ) 
Z  

-----------------------

 **  Grp 0 -  ( pos 52 , len 15 ) 
apqMMMMDDRRroiZ  
 **  Grp 1 -  ( pos 52 , len 1 ) 
a  
 **  Grp 2 -  ( pos 53 , len 11 ) 
pqMMMMDDRRr  
 **  Grp 3 -  ( pos 54 , len 9 ) 
qMMMMDDRR  
 **  Grp 4 -  ( pos 55 , len 1 ) 
M  
 **  Grp 5 -  ( pos 56 , len 1 ) 
M  
 **  Grp 6 -  ( pos 57 , len 1 ) 
M  
 **  Grp 7 -  ( pos 58 , len 1 ) 
M  
 **  Grp 8 -  ( pos 59 , len 1 ) 
D  
 **  Grp 9 -  ( pos 60 , len 1 ) 
D  
 **  Grp 10 -  NULL 
 **  Grp 11 -  ( pos 61 , len 1 ) 
R  
 **  Grp 12 -  ( pos 62 , len 1 ) 
R  
 **  Grp 13 -  ( pos 64 , len 1 ) 
o  
 **  Grp 14 -  ( pos 65 , len 1 ) 
i  
 **  Grp 15 -  ( pos 66 , len 1 ) 
Z  

-----------------------

 **  Grp 0 -  ( pos 69 , len 10 ) 
apqBBBroiZ  
 **  Grp 1 -  ( pos 69 , len 1 ) 
a  
 **  Grp 2 -  ( pos 70 , len 6 ) 
pqBBBr  
 **  Grp 3 -  ( pos 71 , len 4 ) 
qBBB  
 **  Grp 4 -  ( pos 72 , len 1 ) 
B  
 **  Grp 5 -  ( pos 73 , len 1 ) 
B  
 **  Grp 6 -  ( pos 74 , len 1 ) 
B  
 **  Grp 7 -  NULL 
 **  Grp 8 -  NULL 
 **  Grp 9 -  NULL 
 **  Grp 10 -  NULL 
 **  Grp 11 -  NULL 
 **  Grp 12 -  NULL 
 **  Grp 13 -  ( pos 76 , len 1 ) 
o  
 **  Grp 14 -  ( pos 77 , len 1 ) 
i  
 **  Grp 15 -  ( pos 78 , len 1 ) 
Z  

-----------------------

 **  Grp 0 -  ( pos 81 , len 8 ) 
apqvroiZ  
 **  Grp 1 -  ( pos 81 , len 1 ) 
a  
 **  Grp 2 -  ( pos 82 , len 4 ) 
pqvr  
 **  Grp 3 -  ( pos 83 , len 2 ) 
qv  
 **  Grp 4 -  ( pos 84 , len 1 ) 
v  
 **  Grp 5 -  NULL 
 **  Grp 6 -  NULL 
 **  Grp 7 -  NULL 
 **  Grp 8 -  NULL 
 **  Grp 9 -  NULL 
 **  Grp 10 -  NULL 
 **  Grp 11 -  NULL 
 **  Grp 12 -  NULL 
 **  Grp 13 -  ( pos 86 , len 1 ) 
o  
 **  Grp 14 -  ( pos 87 , len 1 ) 
i  
 **  Grp 15 -  ( pos 88 , len 1 ) 
Z  

-----------------------

 **  Grp 0 -  ( pos 91 , len 5 ) 
atswZ  
 **  Grp 1 -  ( pos 91 , len 1 ) 
a  
 **  Grp 2 -  ( pos 92 , len 1 ) 
t  
 **  Grp 3 -  ( pos 94 , len 1 ) 
w  
 **  Grp 4 -  NULL 
 **  Grp 5 -  NULL 
 **  Grp 6 -  NULL 
 **  Grp 7 -  NULL 
 **  Grp 8 -  NULL 
 **  Grp 9 -  NULL 
 **  Grp 10 -  NULL 
 **  Grp 11 -  NULL 
 **  Grp 12 -  NULL 
 **  Grp 13 -  NULL 
 **  Grp 14 -  NULL 
 **  Grp 15 -  ( pos 95 , len 1 ) 
Z  

Upvotes: 0

Views: 184

Answers (1)

Qtax
Qtax

Reputation: 33908

Seems correct. As the number of capturing groups in a branch reset is equal to the highest number of capturing groups in any of its branches.

Here's a quote from perlre:

The numbering within each branch will be as normal, and any groups following this construct will be numbered as though the construct contained only one branch, that being the one with the most capture groups in it.

Upvotes: 2

Related Questions