Using Linq to remove duplicate by column values based on a second column

Question

I have a datatable with two columns "DT" and "ASOF", both dates. There are rows has duplicate values of DT, which I find by doing:

      var duplicateGroups = dt.AsEnumerable()
                                  .GroupBy(row => row.Field("DT"))
                                  .Where(g => g.Count() > 1);

But there is another column, "ASOF" that I want to get rid of one of the duplicate DTs based on the "ASOF" value, basically what ever ASOF date is newest.

I can think of a for loop that will be able to do this by creating an array of the duplicates, finding the newest ASOF, and then removing all others from the DB table with that value. However, I feel like Linq has some ability to be able to do that, possibly with a comparator. But I have some very basic knowledge on it.

Any ideas?

D Stanley · Accepted Answer

Unless you really need to modify the original DataTable, you could just create a projection that will give you the records in each DT group that has the "newest" ASOF value:

  var rows = dt.AsEnumerable()
               .OrderBy(row => row.Field("ASOF"))
               .GroupBy(row => row.Field("DT"))
               .Select(rg => rg.First())
               .ToList();

Then either clear your original data table and replace it with these rows or create a new DataTable based on these rows and swap it with the original.

Another option is to filter the data before it's loaded to the data table, but since you didn't specify how that happens I can't offer any guidance there.

Using Linq to remove duplicate by column values based on a second column

Answers (1)

Related Questions