Reputation: 3200
I have couple of XML files that contain lots of duplicate entries, such as these.
<annotations>
<annotation value=",Clear,Outdoors" eventID="2">
<image location="Location 1" />
<image location="Location 2" />
<image location="Location 2" />
</annotation>
<annotation value=",Not a problem,Gravel,Shopping" eventID="2">
<image location="Location 3" />
<image location="Location 4" />
<image location="Location 5" />
<image location="Location 5" />
<image location="Location 5" />
</annotation>
</annotations>
I want to remove the duplicate elements in the each of the child. The way I approached this is by copying all the elements to a list and then comparing them,
foreach (var el in xdoc.Descendants("annotation").ToList())
{
foreach (var x in el.Elements("image").Attributes("location").ToList())
{
//add elements to a list
}
}
half way through I realized this is very inefficient and time consuming. I'm fairly new to XML, I was wondering if there are any built in methods in C# that I can use to remove duplicates?.
I tried using
if(!x.value.Distinct()) // can't convert collections to bool
x.Remove();
But that doesn't work, neither does
if(x.value.count() > 1) // value.count returns the number of elements.
x.Remove()
Upvotes: 1
Views: 8066
Reputation: 771
using System.Xml.Linq;
XDocument xDoc = XDocument.Parse(xmlString);
xDoc.Root.Elements("annotation")
.SelectMany(s => s.Elements("image")
.GroupBy(g => g.Attribute("location").Value)
.SelectMany(m => m.Skip(1))).Remove();
Upvotes: 6
Reputation: 17206
There's a couple of things that you could do here. As well as the other answers so far, you can note that Distinct() has an overload that takes an IEqualityComparer. You could use something like this ProjectionEqualityComparer to do something like this:
var images = xdoc.Descendants("image")
.Distinct(ProjectionEqualityComparer<XElement>.Create(xe => xe.Attributes("location").First().Value))
... which would give you all of the unique "image" elements that have unique location attributes.
Upvotes: 0
Reputation: 12075
If your duplicates are always in this form, then you could do this with a bit of XSLT to remove duplicate nodes. The XSLT for this is:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="image[@location = preceding-sibling::image/@location]"/>
</xsl:stylesheet>
If it's something that can happen frequently, then it might be worth having that stylesheet loaded into a XslCompiledTransform
instance.
Or you can simply get a list of all duplicate nodes using this XPath:
/annotations/annotation/image[@location = preceding-sibling::image/@location]
and remove them from their parent.
Upvotes: 0