Reputation: 46591
I would like to match the numbers 123456789
and 012
using only one regex in the following strings. I am not sure how to handle all the following scenarios with a single regex:
<one><num>123456789</num><code>012</code></one>
<two><code>012</code><num>123456789</num></two>
<three num="123456789" code="012" />
<four code="012" num="123456789" />
<five code="012"><num>123456789</num></five>
<six num="123456789"><code>012</code></six>
They also don't have to be on the same line like above, for example:
<seven>
<num>123456789</num>
<code>012</code>
</seven>
Upvotes: 1
Views: 142
Reputation: 4333
Parsing XML with regex is not a good idea. You can use XPath or xlinq. xlinq is easier. You must reference System.Xml.Linq and System.Xml and add using declerations. I wrote the code on here, not in visual studio, so there may be minor bugs...
// var xml = ** load xml string
var document = XDocument.Parse(xml);
foreach(var i in document.Root.Elements())
{
var num = "";
var code = "";
if(i.Attributes("num").Length > 0)
{
Console.WriteLine("Num: {0}", i.Attributes("num")[0].Value);
Console.WriteLine("Code: {0}", i.Attributes("code")[0].Value);
}
else
{
Console.WriteLine("Num: {0}", i.Element("num").Value);
Console.WriteLine("Code: {0}", i.Element("code").Value);
}
}
Upvotes: 1
Reputation: 1839
This seems to be doing the trick:
new Regex(@"(?s)<(\w+)(?=.{0,30}(<num>\s*|num="")(\d+))(?=.{0,30}(<code>\s*|code="")(\d+)).*?(/>|</\1>)")
Groups 3 and 5 have "num" and "code" values respectively. It is also reasonably strict, as one of the main concerns when writing regex is to not capture something you don't want (capturing what you want is easy).
Upvotes: 0
Reputation: 1634
In a more abstract level, the problem is to parse either an attribute or a node named num
or code
. Considering C# already has libraries to parse XML documents (and such solutions are also acceptable according to your comments), it's more natural to take advantage of these libraries. The following function will return the specified attribute/node.
static string ParseNode(XmlElement e, string AttributeOrNodeName)
{
if (e.HasAttribute(AttributeOrNodeName))
{
return e.GetAttribute(AttributeOrNodeName);
}
var node = e[AttributeOrNodeName];
if (node != null)
{
return node.InnerText;
}
throw new Exception("The input element doesn't have specified attribute or node.");
}
A test code is like
var doc = new XmlDocument();
var xmlString = "<test><node><num>123456789</num><code>012</code></node>\r\n"
+ "<node><code>012</code><num>123456789</num></node>\r\n"
+ "<node num=\"123456789\" code=\"012\" />\r\n"
+ "<node code=\"012\" num=\"123456789\" />\r\n"
+ "<node code=\"012\"><num>123456789</num></node>\r\n"
+ "<node num=\"123456789\"><code>012</code></node>\r\n"
+ @"<node>
<num>123456789</num>
<code>012</code>
</node>
</test>";
doc.LoadXml(xmlString);
foreach (var num in doc.DocumentElement.ChildNodes.Cast<XmlElement>().Select(x => ParseNode(x, "num")))
{
Console.WriteLine(num);
}
Console.WriteLine();
foreach (var code in doc.DocumentElement.ChildNodes.Cast<XmlElement>().Select(x => ParseNode(x, "code")))
{
Console.WriteLine(code);
}
In my environment (.NET 4), the code captures all the num
and code
values.
Upvotes: 1