user76071
user76071

Reputation:

Calling Distinct() on DataRow collection

Trying to find unique rows in a data table using Distinct() extension method. Some rows contain exactly the same data, but for some reason, the hash code for these rows are different from each other.

I wrote a comparer class implementing IEqualityComparer<DataRow>, however, I think what I'm doing in GetHashCode() is cheesy and nasty.

The reason I've done it this way is because Equals() never gets called unless the hashcodes are the same (Expected behaviour)

class RowValidationComparer : IEqualityComparer<DataRow>
        {
            public bool Equals(DataRow x, DataRow y)
            {
                return x.Field<string>("MyField").Equals(y.Field<string>("MyField"));
            }

            public int GetHashCode(DataRow obj)
            {
                typeof(DataRow).GetHashCode();
            }
        }

Upvotes: 2

Views: 9878

Answers (2)

Jeff Ogata
Jeff Ogata

Reputation: 57783

Trying to find unique rows in a data table using Distinct() extension method.

To do this, you can use the DataRowComparer class:

var distinct = dataTable.AsEnumerable().Distinct(DataRowComparer.Default);

For a general explanation from MSDN, discussing the use of set operators such as Distinct on DataRows:

These operators compare source elements by calling the GetHashCode and Equals methods on each collection of elements. In the case of a DataRow, these operators perform a reference comparison, which is generally not the ideal behavior for set operations over tabular data. For set operations, you usually want to determine whether the element values are equal and not the element references. Therefore, the DataRowComparer class has been added to LINQ to DataSet. This class can be used to compare row values.

The DataRowComparer class contains a value comparison implementation for DataRow, so this class can be used for set operations such as Distinct.

Upvotes: 4

smaglio81
smaglio81

Reputation: 511

You might try ...

public int GetHashCode(DataRow obj) {
    return obj.Field<string>("MyField").GetHashCode();
}

It becomes more complicated the more fields you add to the hash code value. Also, you may want to add null reference checks.

Upvotes: 2

Related Questions