Kovpaev Alexey
Kovpaev Alexey

Reputation: 1786

Regex with balancing groups

I need to write regex that capture generic arguments (that also can be generic) of type name in special notation like this:

System.Action[Int32,Dictionary[Int32,Int32],Int32]

lets assume type name is [\w.]+ and parameter is [\w.,\[\]]+ so I need to grab only Int32, Dictionary[Int32,Int32] and Int32

Basically I need to take something if balancing group stack is empty, but I don't really understand how.

UPD

The answer below helped me solve the problem fast (but without proper validation and with depth limitation = 1), but I've managed to do it with group balancing:

^[\w.]+                                              #Type name
\[(?<delim>)                                         #Opening bracet and first delimiter
[\w.]+                                               #Minimal content
(
[\w.]+                                                       
((?(open)|(?<param-delim>)),(?(open)|(?<delim>)))*   #Cutting param if balanced before comma and placing delimiter
((?<open>\[))*                                       #Counting [
((?<-open>\]))*                                      #Counting ]
)*
(?(open)|(?<param-delim>))\]                         #Cutting last param if balanced
(?(open)(?!)                                         #Checking balance
)$

Demo

UPD2 (Last optimization)

^[\w.]+
\[(?<delim>)
[\w.]+
(?:
 (?:(?(open)|(?<param-delim>)),(?(open)|(?<delim>))[\w.]+)?
 (?:(?<open>\[)[\w.]+)?
 (?:(?<-open>\]))*
)*
(?(open)|(?<param-delim>))\]
(?(open)(?!)
)$

Upvotes: 2

Views: 342

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626728

I suggest capturing those values using

\w+(?:\.\w+)*\[(?:,?(?<res>\w+(?:\[[^][]*])?))*

See the regex demo.

Details:

  • \w+(?:\.\w+)* - match 1+ word chars followed with . + 1+ word chars 1 or more times
  • \[ - a literal [
  • (?:,?(?<res>\w+(?:\[[^][]*])?))* - 0 or more sequences of:
    • ,? - an optional comma
    • (?<res>\w+(?:\[[^][]*])?) - Group "res" capturing:
      • \w+ - one or more word chars (perhaps, you would like [\w.]+)
      • (?:\[[^][]*])? - 1 or 0 (change ? to * to match 1 or more) sequences of a [, 0+ chars other than [ and ], and a closing ].

A C# demo below:

var line = "System.Action[Int32,Dictionary[Int32,Int32],Int32]";
var pattern = @"\w+(?:\.\w+)*\[(?:,?(?<res>\w+(?:\[[^][]*])?))*";
var result = Regex.Matches(line, pattern)
        .Cast<Match>()
        .SelectMany(x => x.Groups["res"].Captures.Cast<Capture>()
            .Select(t => t.Value))
        .ToList();
foreach (var s in result) // DEMO
    Console.WriteLine(s);

UPDATE: To account for unknown depth [...] substrings, use

\w+(?:\.\w+)*\[(?:\s*,?\s*(?<res>\w+(?:\[(?>[^][]+|(?<o>\[)|(?<-o>]))*(?(o)(?!))])?))*

See the regex demo

Upvotes: 2

Related Questions