Muhammed Misir
Muhammed Misir

Reputation: 481

Find duplicate child nodes in XML document

I have the following XML Document

<xml>
    <schedule orderno = "1">
           <item orderno = "1" />
           <item orderno = "2" />
           <item orderno = "3" />
           <item orderno = "2" />
    </schedule>
    <scool orderno = "2">
           <item orderno = "5" />
           <item orderno = "6" />
           <item orderno = "1" />
           <item orderno = "4" />
    </scool>
</xml>

I have inconsistent data in the xml file and need a xpath expression to get the duplicate.

The rule is that the attribute @ordnerno from item in each node scool/schedule must have an unique value. If I have 1 2 3 2 in schedule the @orderno with the value 2 duplicate and inconsistent.

I use the XML linq expression library

XDocument.Parse(structure)
         .Descendants("item")
         .Attributes("orderno")
         .GroupBy(g => g.Value)
         .Where(g => g.Count() > 1)

My solution is suboptimal because it group all nodes, schedule and scool.

The output is 1 and 2 but in this case 1 is not expected.

How can I solve my problem ?

Upvotes: 2

Views: 4659

Answers (1)

lorond
lorond

Reputation: 3896

Try group by item's parent too, something like this:

XDocument.Parse(xml)
         .Descendants("item")
         .GroupBy(x => new { x.Parent.Name, orderno = x.Attribute("orderno").Value } )
         .Where(g => g.Count() > 1);

Update to select nodes with duplicated @orderno on any nesting level:

XDocument.Parse(xml)
         .Root
         .XPathSelectElements("//*[@orderno]")
         .Cast<XElement>()
         .GroupBy(x => new { x.Parent, orderno = x.Attribute("orderno").Value })
         .Where(g => g.Count() > 1)
         .Dump();

Upvotes: 6

Related Questions