Reputation: 155
I have a huge dataTable (around 500k-600k rows). I wanted to compute rows based on some specific columns. Ex: I have 3 columns name ID, type and value. I wanted to compute 'value' column based on 'Type'. I have done it using DataRow Filter - first get the unique 'ID', then for each 'type' compute value. This logic gets really complex and take longer to process. I'm not very good in LINQ, so i was wondering if i can do it better using LINQ or any other way?
DataTable:
ID type value
--------------------------------
2 100 5
2 100 6
2 200 10
3 200 8
3 200 9
4 100 10
4 200 15
The output i'm looking for is:
ID Type Value
2 100 11
2 200 10
3 200 17
4 100 10
4 200 15
Upvotes: 1
Views: 4889
Reputation: 460068
VB.NET(if anybody is interested):
Dim groups = From r In tbl
Group r By IDTypes = _
New With {Key .ID = CInt(r("ID")), _
Key .Type = CInt(r("Type"))}
Into Group
Select New With { _
.ID = IDTypes.ID, _
.Type = IDTypes.Type, _
.Value = Group.Sum(Function(grpRow) (CInt(grpRow("Value"))))}
Here is test-data:
Dim tbl As New DataTable
Dim row As DataRow
Dim rnd As New Random(Now.Millisecond)
tbl.Columns.Add(New DataColumn("ID", GetType(Int32)))
tbl.Columns.Add(New DataColumn("Type", GetType(Int32)))
tbl.Columns.Add(New DataColumn("Value", GetType(Int32)))
For i As Int32 = 1 To 1000000
row = tbl.NewRow
row("ID") = 2 * Rnd.Next(0, 6)
row("Type") = 100 * Rnd.Next(0, 6)
row("Value") = 5 * Rnd.Next(0, 11)
tbl.Rows.Add(row)
Next
Time-Measurement for 1.000.000 Rows:
watch.Start()
Dim execute = groups.Any()
watch.Stop()
Console.WriteLine(String.Format("{0:00}:{1:00}:{2:00}.{3:00}", _
watch.Elapsed.Hours, _
watch.Elapsed.Minutes, _
watch.Elapsed.Seconds, _
watch.Elapsed.Milliseconds / 10))
Results (on 2,26 GHZ Xeon, 24GB) :
~600 Milliseconds for 1.000.000 Rows grouped+totalized to ~36 "ID-Types"
Upvotes: 3
Reputation: 126814
I think what you're looking for is something like this. Obviously, where I've used <int>
, you would need to replace with proper types as appropriate.
var output = from row in table.AsEnumerable()
let id = row.Field<int>("ID")
let type = row.Field<int>("type")
group row by new { id, type } into grp
select new
{
ID = grp.Key.id,
Type = grp.Key.type,
Value = grp.Sum(r => r.Field<int>("value"))
};
This is going to result in rather simple code, but it should not arguably be more efficient than a well written loop (and, of course, if you can offload this to the database instead, you will generally be better off). However, all things held equal, Linq code is pretty well optimized and efficient. If you have doubt about efficiency, measure. Run both your existing code (if you have it) and code from answers and see where you stand.
Upvotes: 5
Reputation: 46047
Assuming that you're looking to do some sort of grouping, with an aggregate of some sort on the value column, you can do something like this:
DataTable table = new DataTable();
var results = from row in table.AsEnumerable()
group row by new { Type = row.Field<int>("Type") } into groups
select new
{
Type = groups.Key.Type,
TotalValue = groups.Sum(x => x.Field<int>("Value"))
};
Upvotes: 0
Reputation: 89092
Why not do it in SQL?
select id, type, sum(value) from TABLE group by id, type
Upvotes: 5