POIR
POIR

Reputation: 3190

Regex nested tags

I want to parse the following text with Regex by the function tag.

Anwers: <function>2+2
                 <function>1+3</function> 
        </function>.  
Thanks for your time. 
<function>sayGoodbye() 
         <function>10*10</function> 
         writeYourName()
</function>

Below is a recursive method that should transform the given text into:

Answers: 44 . Thanks for your time. Goodbye 100 Rex.

private static readonly string TagFormulaStart = "<function>";
private static readonly string TagFormulaEnd = "</function>";

public static string Calculate(string formula)
{
    var pattern = string.Format("{0}(((.|\r|\n)*?)){1}", TagFormulaStart, TagFormulaEnd);
    var matches = Regex.Matches(formula, pattern);

    if (matches.Count == 0)
    {
        return formula;
    }
    else
    {
        var firstAppearanceOfTAG = matches[0].ToString();
        var formulaToCalculate = firstAppearanceOfTAG.Replace(TagFormulaStart, string.Empty).Replace(TagFormulaEnd, string.Empty);
        var result = BgProcessorLib.Evaluator.EvaluateString(formulaToCalculate, null, false);

        formula = formula.Replace(firstAppearanceOfTAG, result);

        return Calculate(formula);
    }
}

The problem is that my regex /<function>(((.|\r|\n)*?))<\/function>/igm in case of nested tags will stop at the first occurrence of the end of function tag.

I attached a picture to make it clearer.

enter image description here

Upvotes: 0

Views: 882

Answers (2)

Nikolay Prokopyev
Nikolay Prokopyev

Reputation: 1312

About XML approach.

At first, make your source valid XML, i.e. add surrounding <root> Answer <function... </root> root tag.

Then use a parser like Linq

XElement root = XElement.Parse(sourceString);

foreach (var funct in root.Descendants("function")).ToList() {
   var evaluated = evaluate(funct.InnerText); // evaluate should be defined before
   funct.InnerText = evaluated;
}

var result = root.ToString();

Then you can just replace out all tags with regex or simple string replace (remove everything between brackets <>). Probably, XML Linq also has a ready tool for that, but i don't know.

Upvotes: 0

Manfred Radlwimmer
Manfred Radlwimmer

Reputation: 13394

While I wouldn't recommend solving this via RegEx, if you really want to, you have to tell your Regex not to include another opening tag, e.g.:

<function>((?!<function>).)*?<\/function>

Warning: terrible performance, for educational purposes only!

Also, you should escape your input:

var pattern string.Format("{0}((?!{0}).)*?{1}", 
    Regex.Escape(TagFormulaStart), 
    Regex.Escape(TagFormulaEnd));

var matches = Regex.Matches(formula, pattern, RegexOptions.Singleline);

This will not account for a lot of realistic use-cases, so again: I wouldn't recommend using RegEx in this particular case.

Online-Demo
Fiddle

Upvotes: 3

Related Questions