Alex
Alex

Reputation: 14618

Alternative RegEx engine for .NET, supporting recursion

I came across a parsing problem, which would take a quite small regular expression to solve, except that the pattern to work, it should be recursive.
Example:

{([^{}]*(?:{(?1)})?)

What I want it to match, is a specific RTF header, but to do that, I need it to be recursive.

{\rtf1\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fnil\fcharset0 Tahoma;}}

Is there some kind of non-true RegEx-like engine implementation for .NET which would allow to find matches to these kinds of patterns (maybe even a different syntax)?

Update:

I do really appreciate everyone for informing me about the Balancing Group option in .NET implementation of Regular Expressions, especially Qtax, who has provided a very comprehensive link as a comment below, which helped me understand what's this all about, and also for posting an answer to my specific example. If you are reading this, and it also helped you, be sure to upvote that answer.
However... That did not answer the general question about recursion possibility in .NET Regex-like engine. This example, fortunately (I like challenges) is by far not the only one I've met. And other situations can't be solved using this solution, but only by being able to reference not a match, but to reuse a sequence of a pattern, to a point where recursion would be possible.

Upvotes: 4

Views: 493

Answers (2)

Alex
Alex

Reputation: 65

Even Qtax exemple is very good and clear, it didn't match completely for me because return {o{o}} instead of {f{o{o}}{bar}baz} .

After looking for a time, my solution is (using almost the same example) :

Input :

string re = @"{(((?<Counter>{)*[^{}]*)*((?<-Counter>})*[^{}]*)*)*(?(Counter)(?!))}";
string str = "foo {bar} baz {foo{bar{{baz}a{a{b}}}}} {f{o{o}}{bar{a{b{c}}{d}}}baz} {foo{bar}baz}";

Console.WriteLine("Input: \"{0}\"", str);
foreach (Match m in Regex.Matches(str, re))
{
    Console.WriteLine("Match: \"{0}\"", m);
}

Output :

Input: "foo {bar} baz {foo{bar{{baz}a{a{b}}}}} {f{o{o}}{bar{a{b{c}}{d}}}baz} {foo{bar}baz}"
Match: "{bar}"
Match: "{foo{bar{{baz}a{a{b}}}}}"
Match: "{f{o{o}}{bar{a{b{c}}{d}}}baz}"
Match: "{foo{bar}baz}"

Some explanation, I increment a counter for each { and decrement the counter at each }. And lastly the regex match only if the counter is empty ((?(Counter)(?!))).

It seems work for deep recursion and also with alternate bracket.

See this site which also help me to create this regex.

I hope this will help.

PS : If you want to match also string with forgotten } at the end, use :

string re = @"{(((?<Counter>{)*[^{}]*)*((?<-Counter>(}|$))*[^{}]*)*)*(?(Counter)(?!))(}|$)";
string str = "foo {bar} baz {foo{bar{{baz}a{a{b}}}}} {f{o{o}}{bar{a{b{c}}{d}}}baz} {foo{bar}b{az";

Upvotes: 3

Qtax
Qtax

Reputation: 33908

For your example using a balancing group would work.

You could use an expression like:

{
[^{}]*
(?:({)[^{}]*)*
(?'-1'})*
(?(1)(?!))
}

Example:

string re = @"{[^{}]*(?:({)[^{}]*)*(?'-1'})*(?(1)(?!))}";
string str = "foo {bar} baz {foo{bar{baz}}} {f{o{o}}{bar}baz} {foo{bar}baz}";

Console.WriteLine("Input: \"{0}\"", str);
foreach (Match m in Regex.Matches(str, re))
{
    Console.WriteLine("Match: \"{0}\"", m);
}

Output:

Input: "foo {bar} baz {foo{bar{baz}}} {f{o{o}}{bar}baz} {foo{bar}baz}"
Match: "{bar}"
Match: "{foo{bar{baz}}}"
Match: "{o{o}}"
Match: "{bar}"
Match: "{bar}"

Upvotes: 3

Related Questions