Chris
Chris

Reputation: 170

Take specific amount of rows from table dependent on column value

I'm looking for guidance on how to programmatically limit the number of rows in a DataTable, depending on a specific column value.

Similar to how take(n) would allow me to get a specific number of rows:

// Returns first 10 rows
dt.Rows.Cast<DataRow>().Take(10).CopyToDataTable();

Say I had a DataTable with 4 rows


+-----------------------------------+------------+
|               Col1                |    Col2    |
+-----------------------------------+------------+
| FRED                              |            |
+-----------------------------------+------------+
| BILL                              |            |
+-----------------------------------+------------+
| BILL                              |            |
+-----------------------------------+------------+
| BOB                               |            |
+-----------------------------------+------------+

With this scenario take(2) would return 2 rows, but I want similiar functionality to return 3 as Col1's value is the same in row 2 and 3.

Would this be fastest with a loop:

var count = 0;

for(int i=0; i < dt.Rows.Count; i++)
{
    // add to new DataTable

    if(i > 0 && Convert.ToString(dt.Rows[i]["Col1"]) != Convert.ToString(dt.Rows[i-1]["Col1"]))
    {
        // check next row
        count++;
        if (count == 10) break;
    }
}

or is there another easier solution?

I'm looking for the distinct values limited to a specific number, but to also get all the duplicates.

Upvotes: 1

Views: 113

Answers (2)

nvoigt
nvoigt

Reputation: 77324

You said in a comment: "I want the distinct values limited to a specific number, but get all the duplicates.". So you only need to take rows until the distinct number of values taken is larger than your limit. For that purpose, a hashset is the easiest option, it will automatically have each distinct value only once, even when you add them multiple times.

var set = new HashSet<string>();
var limit = 2;

var resultingRows = sequence.TakeWhile(row => { set.Add(row[col]); return set.Count <= limit; });

Make sure sequence is materialized and ordered by that column.

Upvotes: 1

Konamiman
Konamiman

Reputation: 50293

So if I got it right, you want to get a list of rows excluding duplicates, where two rows are considered equal if they have the same value in Col1, right?

In that case, you might define an IEqualityComparer for your rows:

class RowComparer : IEqualityComparer<DataRow>
{
    public bool Equals(DataRow x, DataRow y)
    {
        return x["Col1"] == y["Col1"];
    }

    public int GetHashCode(DataRow obj)
    {
        return obj["Col1"].GetHashCode();
    }
}

...and then get the "unique" rows by using LINQ's Distinct:

var uniqueRows = dt.Rows.Cast<DataRow>().Distinct(new RowComparer())

Upvotes: 0

Related Questions