Xaisoft
Xaisoft

Reputation: 46591

Is there a common .NET regex that can be used to match the following numbers?

I would like to match the numbers 123456789 and 012 using only one regex in the following strings. I am not sure how to handle all the following scenarios with a single regex:

<one><num>123456789</num><code>012</code></one>

<two><code>012</code><num>123456789</num></two>

<three num="123456789" code="012" />

<four code="012" num="123456789" />

<five code="012"><num>123456789</num></five>

<six num="123456789"><code>012</code></six>

They also don't have to be on the same line like above, for example:

<seven>
<num>123456789</num>
<code>012</code>
</seven>

Upvotes: 1

Views: 142

Answers (3)

Medeni Baykal
Medeni Baykal

Reputation: 4333

Parsing XML with regex is not a good idea. You can use XPath or xlinq. xlinq is easier. You must reference System.Xml.Linq and System.Xml and add using declerations. I wrote the code on here, not in visual studio, so there may be minor bugs...

// var xml = ** load xml string
var document = XDocument.Parse(xml);

foreach(var i in document.Root.Elements())
{
    var num = ""; 
    var code = "";

    if(i.Attributes("num").Length > 0)
    {
        Console.WriteLine("Num: {0}", i.Attributes("num")[0].Value);
        Console.WriteLine("Code: {0}", i.Attributes("code")[0].Value);
    }
    else
    {
        Console.WriteLine("Num: {0}", i.Element("num").Value);
        Console.WriteLine("Code: {0}", i.Element("code").Value);
    }
}

Upvotes: 1

user1096188
user1096188

Reputation: 1839

This seems to be doing the trick:

new Regex(@"(?s)<(\w+)(?=.{0,30}(<num>\s*|num="")(\d+))(?=.{0,30}(<code>\s*|code="")(\d+)).*?(/>|</\1>)")

Groups 3 and 5 have "num" and "code" values respectively. It is also reasonably strict, as one of the main concerns when writing regex is to not capture something you don't want (capturing what you want is easy).

Upvotes: 0

grapeot
grapeot

Reputation: 1634

In a more abstract level, the problem is to parse either an attribute or a node named num or code. Considering C# already has libraries to parse XML documents (and such solutions are also acceptable according to your comments), it's more natural to take advantage of these libraries. The following function will return the specified attribute/node.

    static string ParseNode(XmlElement e, string AttributeOrNodeName)
    {
        if (e.HasAttribute(AttributeOrNodeName))
        {
            return e.GetAttribute(AttributeOrNodeName);
        }
        var node = e[AttributeOrNodeName];
        if (node != null)
        {
            return node.InnerText;
        }
        throw new Exception("The input element doesn't have specified attribute or node.");
    }

A test code is like

 var doc = new XmlDocument();
 var xmlString = "<test><node><num>123456789</num><code>012</code></node>\r\n"
     + "<node><code>012</code><num>123456789</num></node>\r\n"
     + "<node num=\"123456789\" code=\"012\" />\r\n"
     + "<node code=\"012\" num=\"123456789\" />\r\n"
     + "<node code=\"012\"><num>123456789</num></node>\r\n"
     + "<node num=\"123456789\"><code>012</code></node>\r\n"
     + @"<node>
         <num>123456789</num>
         <code>012</code>
         </node>
         </test>";
 doc.LoadXml(xmlString);
 foreach (var num in doc.DocumentElement.ChildNodes.Cast<XmlElement>().Select(x => ParseNode(x, "num")))
 {
     Console.WriteLine(num);
 }
 Console.WriteLine();
 foreach (var code in doc.DocumentElement.ChildNodes.Cast<XmlElement>().Select(x => ParseNode(x, "code")))
 {
     Console.WriteLine(code);
 }

In my environment (.NET 4), the code captures all the num and code values.

Upvotes: 1

Related Questions