Reputation: 21
Is there a canonical ordering of submatch expressions in a regular expression?
For example: What is the order of the submatches in
"(([0-9]{3}).([0-9]{3}).([0-9]{3}).([0-9]{3}))\s+([A-Z]+)" ?
a. (([0-9]{3})\.([0-9]{3})\.([0-9]{3})\.([0-9]{3}))\s+([A-Z]+)
(([0-9]{3})\.([0-9]{3})\.([0-9]{3})\.([0-9]{3}))
([A-Z]+)
([0-9]{3})
([0-9]{3})
([0-9]{3})
([0-9]{3})
b. (([0-9]{3})\.([0-9]{3})\.([0-9]{3})\.([0-9]{3}))\s+([A-Z]+)
(([0-9]{3})\.([0-9]{3})\.([0-9]{3})\.([0-9]{3}))
([0-9]{3})
([0-9]{3})
([0-9]{3})
([0-9]{3})
([A-Z]+)
or
c. somthin' else.
Upvotes: 2
Views: 863
Reputation: 2940
In Perl 5 regular expressions, answer b is correct. Submatch groupings are stored in order of open-parentheses.
Many other regular expression engines take their cues from Perl, but you would have to look up individual implementations to be sure. I'd suggest the book Mastering Regular Expressions for a deeper understanding.
Upvotes: 2
Reputation: 1137
You count opening parentheses, left to right. So the order would be
(([0-9]{3}).([0-9]{3}).([0-9]{3}).([0-9]{3}))
([0-9]{3})
([0-9]{3})
([0-9]{3})
([0-9]{3})
([A-Z]+)
At least this is what Perl would do. Other regex engines might have different rules.
Upvotes: 0
Reputation: 4345
They tend to be numbered in the order the capturing parens start, left to right. Therefore, option b.
Upvotes: 4