mukul nagpal
mukul nagpal

Reputation: 103

How to find out duplicate Elements in Xelement

I am trying to find out the duplicate Elements in XElement , and make a generic function to remove duplicates .Something like:

 public List<Xelement>RemoveDuplicatesFromXml(List<Xelement> xele)
{ // pass the Xelement List in the Argument and get the List back , after deleting the  duplicate entries.                                       
  return xele;
}

the xml is as follows:

<Execute ID="7300" Attrib1="xyz"    Attrib2="abc" Attrib3="mno" Attrib4="pqr" Attrib5="BCD" />
<Execute ID="7301" Attrib1="xyz"    Attrib2="abc" Attrib3="mno" Attrib4="pqr" Attrib5="BCD" />
<Execute ID="7302" Attrib1="xyz1"    Attrib2="abc" Attrib3="mno" Attrib4="pqr" Attrib5="BCD" />

I want get duplicates on every attribute excluding ID ,and then delete the one having lesser ID.

Thanks,

Upvotes: 0

Views: 1059

Answers (2)

Fabio
Fabio

Reputation: 32455

Use Linq GroupBy

var doc = XDocument.Parse(yourXmlString);
var groups = doc.Root
                .Elements()
                .GroupBy(element => new
                {
                    Attrib1 = element.Attribute("Attrib1").Value,
                    Attrib2 = element.Attribute("Attrib2").Value,
                    Attrib3 = element.Attribute("Attrib3").Value,
                    Attrib4 = element.Attribute("Attrib4").Value,
                    Attrib5 = element.Attribute("Attrib5").Value
                });

var duplicates = group1.SelectMany(group => 
{
    if(group.Count() == 1) // remove this if you want only duplicates
    {
        return group;
    }

    int minId = group.Min(element => int.Parse(element.Attribute("ID").Value));
    return group.Where(element => int.Parse(element.Attribute("ID").Value) > minId);
});

Solution above will remove elements with lesser ID which have duplicates by attributes.
If you want return only elements which have duplicates then remove if fork from last lambda

Upvotes: 1

Niyoko
Niyoko

Reputation: 7672

You can implement custom IEqualityComparer for this task

class XComparer : IEqualityComparer<XElement>
{
    public IList<string> _exceptions;
    public XComparer(params string[] exceptions)
    {
        _exceptions = new List<string>(exceptions);
    }

    public bool Equals(XElement a, XElement b)
    {
        var attA = a.Attributes().ToList();
        var attB = b.Attributes().ToList();

        var setA = AttributeNames(attA);
        var setB = AttributeNames(attB);

        if (!setA.SetEquals(setB))
        {
            return false;
        }

        foreach (var e in setA)
        {
            var xa = attA.First(x => x.Name.LocalName == e);
            var xb = attB.First(x => x.Name.LocalName == e);

            if (xa.Value == null && xb.Value == null)
                continue;

            if (xa.Value == null || xb.Value == null)
                return false;

            if (!xa.Value.Equals(xb.Value))
            {
                return false;
            }
        }

        return true;
    }

    private HashSet<string> AttributeNames(IList<XAttribute> e)
    {
        return new HashSet<string>(e.Select(x =>x.Name.LocalName).Except(_exceptions));
    }

    public int GetHashCode(XElement e)
    {
        var h = 0;

        var atts = e.Attributes().ToList();
        var names = AttributeNames(atts);

        foreach (var a in names)
        {
            var xa = atts.First(x => x.Name.LocalName == a);

            if (xa.Value != null)
            {
                h = h ^ xa.Value.GetHashCode();
            }           
        }

        return h;
    }
}

Usage:

var comp = new XComparer("ID");
var distXEle = xele.Distinct(comp);

Please note that IEqualityComparer implementation in this answer only compare LocalName and doesn't take namespace into considerataion. If you have element with duplicate local name attribute, then this implementation will take the first one.

You can see the demo here : https://dotnetfiddle.net/w2DteS


Edit

If you want to

delete the one having lesser ID

It means you want the largest ID, then you can chain the .Distinct call with .Select.

var comp = new XComparer("ID");
var distXEle = xele
    .Distinct(comp)
    .Select(z => xele
        .Where(a => comp.Equals(z, a))
        .OrderByDescending(a => int.Parse(a.Attribute("ID").Value))
        .First()
    );

It will guarantee that you get the element with largest ID.

Upvotes: 1

Related Questions