Reputation: 170
I'm looking for guidance on how to programmatically limit the number of rows in a DataTable
, depending on a specific column value.
Similar to how take(n)
would allow me to get a specific number of rows:
// Returns first 10 rows
dt.Rows.Cast<DataRow>().Take(10).CopyToDataTable();
Say I had a DataTable
with 4 rows
+-----------------------------------+------------+ | Col1 | Col2 | +-----------------------------------+------------+ | FRED | | +-----------------------------------+------------+ | BILL | | +-----------------------------------+------------+ | BILL | | +-----------------------------------+------------+ | BOB | | +-----------------------------------+------------+
With this scenario take(2)
would return 2 rows, but I want similiar functionality to return 3 as Col1's value is the same in row 2 and 3.
Would this be fastest with a loop:
var count = 0;
for(int i=0; i < dt.Rows.Count; i++)
{
// add to new DataTable
if(i > 0 && Convert.ToString(dt.Rows[i]["Col1"]) != Convert.ToString(dt.Rows[i-1]["Col1"]))
{
// check next row
count++;
if (count == 10) break;
}
}
or is there another easier solution?
I'm looking for the distinct values limited to a specific number, but to also get all the duplicates.
Upvotes: 1
Views: 113
Reputation: 77324
You said in a comment: "I want the distinct values limited to a specific number, but get all the duplicates.". So you only need to take rows until the distinct number of values taken is larger than your limit. For that purpose, a hashset is the easiest option, it will automatically have each distinct value only once, even when you add them multiple times.
var set = new HashSet<string>();
var limit = 2;
var resultingRows = sequence.TakeWhile(row => { set.Add(row[col]); return set.Count <= limit; });
Make sure sequence is materialized and ordered by that column.
Upvotes: 1
Reputation: 50293
So if I got it right, you want to get a list of rows excluding duplicates, where two rows are considered equal if they have the same value in Col1
, right?
In that case, you might define an IEqualityComparer for your rows:
class RowComparer : IEqualityComparer<DataRow>
{
public bool Equals(DataRow x, DataRow y)
{
return x["Col1"] == y["Col1"];
}
public int GetHashCode(DataRow obj)
{
return obj["Col1"].GetHashCode();
}
}
...and then get the "unique" rows by using LINQ's Distinct
:
var uniqueRows = dt.Rows.Cast<DataRow>().Distinct(new RowComparer())
Upvotes: 0