dpac
dpac

Reputation: 155

Efficient way of processing huge Datatable in C#.net

I have a huge dataTable (around 500k-600k rows). I wanted to compute rows based on some specific columns. Ex: I have 3 columns name ID, type and value. I wanted to compute 'value' column based on 'Type'. I have done it using DataRow Filter - first get the unique 'ID', then for each 'type' compute value. This logic gets really complex and take longer to process. I'm not very good in LINQ, so i was wondering if i can do it better using LINQ or any other way?

DataTable:

ID       type      value  
--------------------------------
2         100         5

2         100         6

2         200         10

3         200         8

3         200         9

4         100         10

4         200         15

The output i'm looking for is:

ID     Type          Value

2       100            11

2       200            10

3       200            17

4       100            10

4       200            15

Upvotes: 1

Views: 4889

Answers (4)

Tim Schmelter
Tim Schmelter

Reputation: 460068

VB.NET(if anybody is interested):

Dim groups = From r In tbl
             Group r By IDTypes = _
             New With {Key .ID = CInt(r("ID")), _
                       Key .Type = CInt(r("Type"))}
                  Into Group
             Select New With { _
                    .ID = IDTypes.ID, _
                    .Type = IDTypes.Type, _
                    .Value = Group.Sum(Function(grpRow) (CInt(grpRow("Value"))))}

Here is test-data:

Dim tbl As New DataTable
Dim row As DataRow
Dim rnd As New Random(Now.Millisecond)
tbl.Columns.Add(New DataColumn("ID", GetType(Int32)))
tbl.Columns.Add(New DataColumn("Type", GetType(Int32)))
tbl.Columns.Add(New DataColumn("Value", GetType(Int32)))
For i As Int32 = 1 To 1000000
    row = tbl.NewRow
    row("ID") = 2 * Rnd.Next(0, 6)
    row("Type") = 100 * Rnd.Next(0, 6)
    row("Value") = 5 * Rnd.Next(0, 11)
    tbl.Rows.Add(row)
Next

Time-Measurement for 1.000.000 Rows:

watch.Start()
Dim execute = groups.Any()
watch.Stop()
Console.WriteLine(String.Format("{0:00}:{1:00}:{2:00}.{3:00}", _
                                        watch.Elapsed.Hours, _
                                        watch.Elapsed.Minutes, _
                                        watch.Elapsed.Seconds, _
                                        watch.Elapsed.Milliseconds / 10))

Results (on 2,26 GHZ Xeon, 24GB) :

  1. 00:00:00.61
  2. 00:00:00.58
  3. 00:00:00:63

~600 Milliseconds for 1.000.000 Rows grouped+totalized to ~36 "ID-Types"

Upvotes: 3

Anthony Pegram
Anthony Pegram

Reputation: 126814

I think what you're looking for is something like this. Obviously, where I've used <int>, you would need to replace with proper types as appropriate.

var output = from row in table.AsEnumerable()
             let id = row.Field<int>("ID")
             let type = row.Field<int>("type")
             group row by new { id, type } into grp 
             select new 
             {
                 ID = grp.Key.id,
                 Type = grp.Key.type,
                 Value = grp.Sum(r => r.Field<int>("value"))
             };

This is going to result in rather simple code, but it should not arguably be more efficient than a well written loop (and, of course, if you can offload this to the database instead, you will generally be better off). However, all things held equal, Linq code is pretty well optimized and efficient. If you have doubt about efficiency, measure. Run both your existing code (if you have it) and code from answers and see where you stand.

Upvotes: 5

James Johnson
James Johnson

Reputation: 46047

Assuming that you're looking to do some sort of grouping, with an aggregate of some sort on the value column, you can do something like this:

DataTable table = new DataTable();

var results = from row in table.AsEnumerable()
              group row by new { Type = row.Field<int>("Type") } into groups
              select new
              {
                  Type = groups.Key.Type,
                  TotalValue = groups.Sum(x => x.Field<int>("Value"))
              };

Upvotes: 0

Jason
Jason

Reputation: 89092

Why not do it in SQL?

select id, type, sum(value) from TABLE group by id, type

Upvotes: 5

Related Questions