user1044169
user1044169

Reputation: 2866

RegEx to parse nested tags?

I have text like this:

This is {name1:value1}{name2:{name3:even dipper {name4:valu4} dipper} some inner text} text

I want to parse out data like that:

Name: name1
Value: value1

Name: name2
Value: {name3:even dipper {name4:valu4} dipper} some inner text

I would then recursively process each value to parse out nested fields. Can you recommend a RegEx expression to do this?

Upvotes: 0

Views: 2145

Answers (2)

Qtax
Qtax

Reputation: 33908

In C# you can use balancing groups to count and balance the brackets:

{ (?'name' \w+ ) :       # start of tag
(?'value'                # named capture
  (?>                    # don't backtrack
    (?:
      [^{}]+             # not brackets
    | (?'open' { )       # count opening bracket
    | (?'close-open' } ) # subtract closing bracket (matches only if open count > 0)
    )*
  )
  (?(open)(?!))          # make sure open is not > 0
)
}                        # end of tag

Example:

string re = @"(?x)       # enable eXtended mode (comments/spaces ignored)
{ (?'name' \w+ ) :       # start of tag
(?'value'                # named capture
  (?>                    # don't backtrack
    (?:
      [^{}]+             # not brackets
    | (?'open' { )       # count opening bracket
    | (?'close-open' } ) # subtract closing bracket (matches only if open count > 0)
    )*
  )
  (?(open)(?!))          # make sure open is not > 0
)
}                        # end of tag
";

string str = @"This is {name1:value1}{name2:{name3:even dipper {name4:valu4} dipper} some inner text} text";

foreach (Match m in Regex.Matches(str, re))
{
    Console.WriteLine("name: {0}, value: {1}", m.Groups["name"], m.Groups["value"]);
}

Output:

name: name1, value: value1
name: name2, value: {name3:even dipper {name4:valu4} dipper} some inner text

Upvotes: 3

Qtax
Qtax

Reputation: 33908

If using Perl/PHP/PCRE it's not complicated at all. You can use an expression like:

{(\w+):         # start of tag
   ((?:
      [^{}]+    # not a tag
   |  (?R)      # a tag (recurse to match the whole regex)
   )*)
}               # end of tag

Upvotes: 2

Related Questions