Reputation: 707
I have a DataTable
which looks like this:
ID Name DateBirth
.......................
1 aa 1.1.11
2 bb 2.3.11
2 cc 1.2.12
3 cd 2.3.12
Which is the fastest way to remove the rows with the same ID, to get something like this (keep the first occurrence, delete the next ones):
ID Name DateBirth
.......................
1 aa 1.1.11
2 bb 2.3.11
3 cd 2.3.12
I don't want to double pass the table rows, because the row number is big. I want to use some LinQ if possible, but I guess it will be a big query and I have to use a comparer.
Upvotes: 3
Views: 13342
Reputation: 16563
ID
var rowsToDelete =
(from row in dataTable.AsEnumerable()
group row by row.ID into g
where g.Count() > 1
Name
and keep first record) and select the rest select g.OrderBy( dr => dr.Field<DateTime>( "DateBirth" ) ).ThenBy( dr => dr.Field<string>( "Name" ) ).Skip(1))
.SelectMany( g => g );
rowsToDelete.ForEach( dr => dr.Delete() );
dataTable.AcceptChanges();
Upvotes: 2
Reputation: 11727
I was solving the same situation and found it quite interesting and would like to share my finding.
DataTable newDatatable = dt.DefaultView.ToTable(true, "ID", "Name", "DateBirth");
The columns you mention here, only those will be returned back in newDatatable
.
LINQ
query. DataTable newDatatable = dt.AsEnumerable()
.GroupBy(dr => dr.Field<int>("ID"))
.Select(dg => dg).Take(1)
.CopyToDataTable();
List<string> toExclude = new List<string>();
for (int i = 0; i < dt.Rows.Count; i++)
{
var idValue = (string)dt.Rows[i]["ID"];
if (toExclude.Contains(idValue))
{
dt.Rows.Remove(dt.Rows[i]);
i--;
}
toExclude.Add(glAccount);
}
Third being my favorite.
I may have answered few things which are not asked in the question. It was done in good intent and with little excitement as well.
Hope it helps.
Upvotes: 3
Reputation: 10218
Heres a way to achive this,
All you need to use moreLinq library use its function DistinctBy
Code:
protected void Page_Load(object sender, EventArgs e)
{
var DistinctByIdColumn = getDT2().AsEnumerable()
.DistinctBy(
row => new { Id = row["Id"] });
DataTable dtDistinctByIdColumn = DistinctByIdColumn.CopyToDataTable();
}
public DataTable getDT2()
{
DataTable dt = new DataTable();
dt.Columns.Add("Id", typeof(string));
dt.Columns.Add("Name", typeof(string));
dt.Columns.Add("Dob", typeof(string));
dt.Rows.Add("1", "aa","1.1.11");
dt.Rows.Add("2", "bb","2.3.11");
dt.Rows.Add("2", "cc","1.2.12");
dt.Rows.Add("3", "cd","2.3.12");
return dt;
}
OutPut: As what you expected
For moreLinq sample code view my blog
Upvotes: 1
Reputation: 460340
Not necessarily the most efficient approach, but maybe the most readable:
table = table.AsEnumerable()
.GroupBy(row => row.Field<int>("ID"))
.Select(rowGroup => rowGroup.First())
.CopyToDataTable();
Linq is also more powerful. For example, if you want to change the logic and not select the first (arbitrary) row of each id-group but the last according to DateBirth
:
table = table.AsEnumerable()
.GroupBy(row => row.Field<int>("ID"))
.Select(rowGroup => rowGroup
.OrderByDescending(r => r.Field<DateTime>("DateBirth"))
.First())
.CopyToDataTable();
Upvotes: 2
Reputation: 75326
You can use LINQ to DataTable, to distinct based on column ID
, you can group by on this column, then do select first:
var result = dt.AsEnumerable()
.GroupBy(r => r.Field<int>("ID"))
.Select(g => g.First())
.CopyToDataTable();
Upvotes: 10
Reputation: 13599
you can try this
DataTable uniqueCols = dt.DefaultView.ToTable(true, "ID");
Upvotes: 2