deamon
deamon

Reputation: 92397

Regex to find named capturing groups with Go programming language

I'm looking for a regex to find named capturing groups in (other) regex strings.

Example: I want to find (?P<country>m((a|b).+)n), (?P<city>.+) and (?P<street>(5|6)\. .+) in the following regex:

/(?P<country>m((a|b).+)n)/(?P<city>.+)/(?P<street>(5|6)\. .+)

I tried the following regex to find the named capturing groups:

var subGroups string = `(\(.+\))*?`
var prefixedSubGroups string = `.+` + subGroups
var postfixedSubGroups string = subGroups + `.+`
var surroundedSubGroups string = `.+` + subGroups + `.+`
var capturingGroupNameRegex *regexp.RichRegexp = regexp.MustCompile(
    `(?U)` + 
    `\(\?P<.+>` + 
    `(` +   prefixedSubGroups + `|` + postfixedSubGroups + `|` + surroundedSubGroups + `)` + 
    `\)`) 

?U makes greedy quantifiers(+ and *) non-greedy, and non-greedy quantifiers (*?) greedy. Details in the Go regex documentation.

But it doesn't work because parenthesis are not matched correctly.

Upvotes: 6

Views: 2923

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336128

Matching arbitrarily nested parentheses correctly is not possible with regular expressions because arbitrary (recursive) nesting cannot be described by a regular language.

Some modern regex flavor do support recursion (Perl, PCRE) or balanced matching (.NET), but Go is not one of them (the docs explicitly say that Perl's (?R) construct is not supported by the RE2 library that Go's regex package appears to be based on). You need to build a recursive descent parser, not a regex.

Upvotes: 7

Related Questions