Bobak_KS
Bobak_KS

Reputation: 554

Super slow performance of querying a DataTable with LINQ

I have a code similar to this structure: my table has 108000 rows. This datatable is really just I read a tab delimited text file to process so I put it in a datatable.

private void Foo(DataTable myDataTable)
{

        List<string> alreadyProcessed = new List<string>();
        foreach(DataRow row in myDataTable.Rows)
        {
            string current = row["Emp"].ToString().Trim();
            if (alreadyProcessed.Contains(current))
                continue;
            alreadyProcessed.Add(current);
            var empRows = from p in myDataTable.AsEnumerable
                where p["Emp"].ToString().Trim() == current
                select new EmpClass
                {
                    LastName = (string) p["lName"],
                    // some more stuff similr
                };

        // do some stuff with that EmpClass but they shouldn't take too long
        }
    }

Running such a thing is taking more than 15 minutes. How can I improve this?

Upvotes: 0

Views: 3408

Answers (3)

Lasse V. Karlsen
Lasse V. Karlsen

Reputation: 391466

Here is a rather naive rewrite of your code.

Instead of tracking which employees you have already processed, let's just group all rows by their employees and process them each separately.

var rowsPerEmployee =
    (from DataRow row in dt.Rows
     let emp = row["Emp"].ToString().Trim()
     group row by emp into g
     select g)
    .ToDictionary(
        g => g.Key,
        g => g.ToArray());

foreach (var current in rowsPerEmployee.Keys)
{
    var empRows = rowsPerEmployee[current];
    ... rest of your code here, note that empRows is not all the rows for a single employee
    ... and not just the lastname or similar
}

This will first group the entire datatable by the employee and create a dictionary from employee to rows for that employee, and then loop on the employees and get the rows.

Upvotes: 1

One thing that might slow things down for you in the linq statement is, how much data you're selecting! you write 'select new EmpClass', and depending on how many columns (and rows/information for that matter) your selecting to become your output may slow things drastically down for you. Other tips and tricks to work on that problem may be found in: http://visualstudiomagazine.com/articles/2010/06/24/five-tips-linq-to-sql.aspx

Upvotes: 0

AD.Net
AD.Net

Reputation: 13399

You should do Group By "EMP", otherwise you're going through each row and for some rows you're querying the whole table. Something like this

from p in myDataTable.AsEnumerable
group p by p.Field<string>("Emp") into g
select new { Emp = g.Key, 
             Data = g.Select(gg=>new EmpClass 
                                  { 
                                     LastName = gg.Field<string>("lName")
                                  }
                             )
           }

Upvotes: 0

Related Questions