darkdante
darkdante

Reputation: 707

C# - Remove rows with the same column value from a DataTable

I have a DataTable which looks like this:

 ID   Name    DateBirth
.......................
 1     aa      1.1.11
 2     bb      2.3.11
 2     cc      1.2.12
 3     cd      2.3.12

Which is the fastest way to remove the rows with the same ID, to get something like this (keep the first occurrence, delete the next ones):

 ID   Name    DateBirth
.......................
 1     aa      1.1.11
 2     bb      2.3.11
 3     cd      2.3.12

I don't want to double pass the table rows, because the row number is big. I want to use some LinQ if possible, but I guess it will be a big query and I have to use a comparer.

Upvotes: 3

Views: 13342

Answers (6)

Moho
Moho

Reputation: 16563

  1. Get a record count for each ID
var rowsToDelete = 
    (from row in dataTable.AsEnumerable()
    group row by row.ID into g
    where g.Count() > 1
  1. Determine which record to keep (don't know your criteria; I will just sort by DoB then Name and keep first record) and select the rest
select g.OrderBy( dr => dr.Field<DateTime>( "DateBirth" ) ).ThenBy( dr => dr.Field<string>( "Name" ) ).Skip(1))
  1. Flatten
.SelectMany( g => g );
  1. Delete rows
rowsToDelete.ForEach( dr => dr.Delete() );
  1. Accept changes
dataTable.AcceptChanges();

Upvotes: 2

Sandy
Sandy

Reputation: 11727

I was solving the same situation and found it quite interesting and would like to share my finding.

  1. If rows are to be distinct based on ALL COLUMNS.
DataTable newDatatable = dt.DefaultView.ToTable(true, "ID", "Name", "DateBirth");

The columns you mention here, only those will be returned back in newDatatable.

  1. If distinct based on one column and column type is int then I would prefer LINQ query.
  DataTable newDatatable = dt.AsEnumerable()
                           .GroupBy(dr => dr.Field<int>("ID"))
                           .Select(dg => dg).Take(1)
                           .CopyToDataTable();
  1. If distinct based on one column and column type is string then I would prefer loop.
List<string> toExclude = new List<string>();
for (int i = 0; i < dt.Rows.Count; i++)
{
    var idValue = (string)dt.Rows[i]["ID"];
    if (toExclude.Contains(idValue))
    {
        dt.Rows.Remove(dt.Rows[i]);
        i--;
    }
    toExclude.Add(glAccount);
}

Third being my favorite.

I may have answered few things which are not asked in the question. It was done in good intent and with little excitement as well.

Hope it helps.

Upvotes: 3

Satinder singh
Satinder singh

Reputation: 10218

Heres a way to achive this, All you need to use moreLinq library use its function DistinctBy

Code:

protected void Page_Load(object sender, EventArgs e)
{
  var DistinctByIdColumn = getDT2().AsEnumerable()
                                   .DistinctBy(
                                   row => new { Id = row["Id"] });
  DataTable dtDistinctByIdColumn = DistinctByIdColumn.CopyToDataTable();
}


public DataTable getDT2()
{
   DataTable dt = new DataTable();
   dt.Columns.Add("Id", typeof(string));
   dt.Columns.Add("Name", typeof(string));
   dt.Columns.Add("Dob", typeof(string));
   dt.Rows.Add("1", "aa","1.1.11");
   dt.Rows.Add("2", "bb","2.3.11");
   dt.Rows.Add("2", "cc","1.2.12");
   dt.Rows.Add("3", "cd","2.3.12");
   return dt;
}

OutPut: As what you expected

enter image description here

For moreLinq sample code view my blog

Upvotes: 1

Tim Schmelter
Tim Schmelter

Reputation: 460340

Not necessarily the most efficient approach, but maybe the most readable:

table = table.AsEnumerable()
    .GroupBy(row => row.Field<int>("ID"))
    .Select(rowGroup => rowGroup.First())
    .CopyToDataTable();

Linq is also more powerful. For example, if you want to change the logic and not select the first (arbitrary) row of each id-group but the last according to DateBirth:

table = table.AsEnumerable()
    .GroupBy(row => row.Field<int>("ID"))
    .Select(rowGroup => rowGroup
                          .OrderByDescending(r => r.Field<DateTime>("DateBirth"))
                          .First())
    .CopyToDataTable();

Upvotes: 2

cuongle
cuongle

Reputation: 75326

You can use LINQ to DataTable, to distinct based on column ID, you can group by on this column, then do select first:

  var result = dt.AsEnumerable()
                 .GroupBy(r => r.Field<int>("ID"))
                 .Select(g => g.First())
                 .CopyToDataTable();

Upvotes: 10

COLD TOLD
COLD TOLD

Reputation: 13599

you can try this

DataTable uniqueCols = dt.DefaultView.ToTable(true, "ID");

Upvotes: 2

Related Questions