Reputation: 16060
I know this as asked many times but cannot see something that works. I am reading a csv file and then I have to remove duplicate lines based on one of the columns "CustomerID". Basically the CSV file can have multiple lines with the same customerID.
I need to remove the duplicates.
//DOES NOT WORK
var finalCustomerList = csvCustomerList.Distinct().ToList();
I have also tried this extension method //DOES NOT WORK
public static IEnumerable<t> RemoveDuplicates<t>(this IEnumerable<t> items)
{
return new HashSet<t>(items);
}
What works for me is
Loop through csvCustomerList and check if a customerExists If it doesnt I add it.
foreach (var csvCustomer in csvCustomerList)
{
var Customer = new customer();
customer.CustomerID = csvCustomer.CustomerID;
customer.Name = csvCustomer.Name;
//etc.....
var exists = finalCustomerList.Exists(x => x.CustomerID == csvCustomer.CustomerID);
if (!exists)
{
finalCustomerList.Add(customer);
}
}
Is there a better way of doing this?
Upvotes: 1
Views: 1079
Reputation: 174457
For Distinct
to work with non standard equality checks, you need to make your class customer
implement IEquatable<T>
. In the Equals
method, simply compare the customer ids and nothing else.
As an alternative, you can use the overload of Distinct that requires an IEqualityComparer<T>
and create a class that implements that interface for customer
. Like that, you don't need to change the customer
class.
Or you can use Morelinq as suggested by another answer.
Upvotes: 4
Reputation: 18832
For a simple solution, check out Morelinq by Jon Skeet and others.
It has a DistinctBy
operator where you can perform a distinct operation by any field. So you could do something like:
var finalCustomerList = csvCustomerList.DistinctBy(c => c.customerID).ToList();
Upvotes: 3