disasterkid
disasterkid

Reputation: 7278

Remove duplicate rows of a datatable based on a list of key fields

I am using the following code to remove duplicate rows in a DataTable based on the value of one field (keyField)

IEnumerable<DataRow> uniqueContacts = dt.AsEnumerable()
                    .GroupBy(x =>  x[keyField].ToString())
                    .Select(g => g.First());
DataTable dtOut = uniqueContacts.CopyToDataTable();

How can I upgrade this code so that my LINQ removes duplicates based on the value of a list of fields. e.g. remove all rows that have the same 'firstname' and 'lastname'?

Upvotes: 1

Views: 1460

Answers (1)

Tim Schmelter
Tim Schmelter

Reputation: 460028

You can use an anonymous type:

IEnumerable<DataRow> uniqueContacts = dt.AsEnumerable()
                    .GroupBy(row =>  new { 
                        FirstName = row.Field<string>("FirstName"),
                        LastName  = row.Field<string>("LastName")
                    })
                    .Select(g => g.First());

Since you want a solution that works with a List<string> that is unknown at compile time you could use this class:

public class MultiFieldComparer : IEquatable<IEnumerable<object>>, IEqualityComparer<IEnumerable<object>>
{
    private IEnumerable<object> objects;

    public MultiFieldComparer(IEnumerable<object> objects)
    {
        this.objects = objects;
    }

    public bool Equals(IEnumerable<object> x, IEnumerable<object> y)
    {
        return x.SequenceEqual(y);
    }

    public int GetHashCode(IEnumerable<object> objects)
    {
        unchecked
        {
            int hash = 17;
            foreach (object obj in objects)
                hash = hash * 23 + (obj == null ? 0 : obj.GetHashCode());
            return hash;
        }
    }

    public override int GetHashCode()
    {
        return GetHashCode(this.objects);
    }

    public override bool Equals(object obj)
    {
        MultiFieldComparer other = obj as MultiFieldComparer;
        if (other == null) return false;
        return this.Equals(this.objects, other.objects);
    }

    public bool Equals(IEnumerable<object> other)
    {
        return this.Equals(this.objects, other);
    }
}

and this extension method using this class:

public static IEnumerable<DataRow> RemoveDuplicates(this IEnumerable<DataRow> rows, IEnumerable<string> fields)
{
    return rows
        .GroupBy(row => new MultiFieldComparer(fields.Select(f => row[f])))
        .Select(g => g.First());
}

then it's simple as:

List<string> columns = new List<string> { "FirstName", "LastName" };
var uniqueContacts = dt.AsEnumerable().RemoveDuplicates(columns).CopyToDataTable();

Upvotes: 2

Related Questions