user16753
user16753

Reputation: 21

Numbering Regex Submatches

Is there a canonical ordering of submatch expressions in a regular expression?

For example: What is the order of the submatches in
"(([0-9]{3}).([0-9]{3}).([0-9]{3}).([0-9]{3}))\s+([A-Z]+)" ?

a. (([0-9]{3})\.([0-9]{3})\.([0-9]{3})\.([0-9]{3}))\s+([A-Z]+)  
   (([0-9]{3})\.([0-9]{3})\.([0-9]{3})\.([0-9]{3}))  
   ([A-Z]+)  
   ([0-9]{3})  
   ([0-9]{3})  
   ([0-9]{3})  
   ([0-9]{3})  

b. (([0-9]{3})\.([0-9]{3})\.([0-9]{3})\.([0-9]{3}))\s+([A-Z]+)  
   (([0-9]{3})\.([0-9]{3})\.([0-9]{3})\.([0-9]{3}))  
   ([0-9]{3})  
   ([0-9]{3})  
   ([0-9]{3})  
   ([0-9]{3})  
   ([A-Z]+)  

or

c. somthin' else. 

Upvotes: 2

Views: 863

Answers (3)

Adrian Dunston
Adrian Dunston

Reputation: 2940

In Perl 5 regular expressions, answer b is correct. Submatch groupings are stored in order of open-parentheses.

Many other regular expression engines take their cues from Perl, but you would have to look up individual implementations to be sure. I'd suggest the book Mastering Regular Expressions for a deeper understanding.

Upvotes: 2

Asgeir S. Nilsen
Asgeir S. Nilsen

Reputation: 1137

You count opening parentheses, left to right. So the order would be

(([0-9]{3}).([0-9]{3}).([0-9]{3}).([0-9]{3}))
([0-9]{3})
([0-9]{3})
([0-9]{3})
([0-9]{3})
([A-Z]+)

At least this is what Perl would do. Other regex engines might have different rules.

Upvotes: 0

jjrv
jjrv

Reputation: 4345

They tend to be numbered in the order the capturing parens start, left to right. Therefore, option b.

Upvotes: 4

Related Questions