daniel
daniel

Reputation: 31

ToList slow performance vs foreach slow performance

I am building program that use DataBase with 3 tables(Worker, Task, TaskStep) and i have a method that get date and build report for specific worker of the task and there steps for the specific day.

The data base structure is as follow:

MySQL 5.2

Worker table columns:

workerID(VARCHAR(45)),
name(VARCHAR(45)),
age(int),
...

Tasks table columns:

TaskID(VARCHAR(45)),
description(VARCHAR(45)),
date(DATE),
...

TaskSteps table columns:

TaskStepID(VARCHAR(45)),
description(VARCHAR(45)),
date(DATE),
...

No indexing on any table

The problem is thats it is very very slow!! (~ 20 seconds)

Here is the code:

using WorkerDailyReport = Dictionary<task, IEnumerable<taskStep>>;

private void Buildreport(DateTime date)
{
    var report = new WorkerDailyReport();    

    // Load from DB
    var sw = new Stopwatch();
    sw.Start();

    var startOfDay  = date.Date;
    var endOfDay    = startOfDay.AddDays(1);
    var db          = new WorkEntities();

    const string    workerID   = "80900855";

    IEnumerable<task> _tasks = db.task
                    .Where(ta =>    ta.date     >= startOfDay   &&
                                    ta.date     <  endOfDay     &&
                                    ta.workerID == workerID)
                    .ToList();

    sw.Stop();
    Console.WriteLine("Load From DB time - " + sw.Elapsed + 
                      ", Count - "           + _tasks.Count());   

    // Build the report
    sw.Restart();

    foreach (var t in _tasks)
    {
        var ts = db.taskStep.Where(s => s.taskID == task.taskID);

        report.Add(t, ts);
    }

    sw.Stop();
    Console.WriteLine("Build report time - " + sw.Elapsed);

    // Do somthing with the report
    foreach (var t in report)
    {
        sw.Restart();

        foreach (var subNode in t.Value)
        {
            // Do somthing..
        }

        Console.WriteLine("Do somthing time - " + sw.Elapsed + 
                          ", Count - " + t.Value.Count());
    }
}

As u can see i put StopWatch in each part to check what take so long and this is the results:

1)

If i run the code as above:

Console:

Load From DB time - 00:00:00.0013774, Count - 577

Build report time - 00:00:03.6305722

Do somthing time - 00:00:07.7573754, Count - 21

Do somthing time - 00:00:08.2811928, Count - 11

Do somthing time - 00:00:07.8715531, Count - 14

Do somthing time - 00:00:08.0430597, Count - 0

Do somthing time - 00:00:07.7867790, Count - 9

Do somthing time - 00:00:07.3485209, Count - 39

.........

the inner foreach run takes about 7-9!! Sec to run over no more then 40 record.

2)

If i change only one thing, Add .ToList() after the first query when i load the worker tasks from the Data Base it changes everithing.

Console:

Load From DB time - 00:00:04.3568445, Count - 577

Build report time - 00:00:00.0018535

Do somthing time - 00:00:00.0191099, Count - 21

Do somthing time - 00:00:00.0144895, Count - 11

Do somthing time - 00:00:00.0150208, Count - 14

Do somthing time - 00:00:00.0179021, Count - 0

Do somthing time - 00:00:00.0151372, Count - 9

Do somthing time - 00:00:00.0155703, Count - 39

.........

Now the load from DataBase takes lot more time, 4+ sec. But the Built report time is about ~1ms And each inner foreach takes ~10ms

The first way is imposible(577 * ~8 seconds) and the seconde option is also very slow and i cant see y.

Any idea what happening here?

1) Why the ToList() so slow ?

2) Why without the ToList(), The inner foreach and the Build report is slowing?

How can i make it faster?

thnx.

Upvotes: 2

Views: 4558

Answers (3)

Kirill Bestemyanov
Kirill Bestemyanov

Reputation: 11964

To improve performance, you should use one query to get data from all table:

var _joinedTasks = db.task.Where(ta =>    ta.date     >= startOfDay   &&
                                    ta.date     <  endOfDay     &&
                                    ta.workerID == workerID)
                    .Join(db.taskStep, t => t.taskID, ts=>ts.taskID, (t, ts) => new {t, ts})
                    .GroupBy(g => g.t, v=>v.ts).AsEnumerable();

Then you can add it to dictionary:

var report = _joinedTasks.ToDictionary(g=>g.Key);

And use this report as you wish.

Upvotes: 0

Tom
Tom

Reputation: 26849

LINQ ToList() always evaluates the sequence immediately - in your case SQL query on database.

In the first instance you got Load From DB time - 00:00:00.0013774, Count - 577 - that was quick as you did not run SQL query. However, the query was ran a bit later - that's why you got Build report time - 00:00:03.6305722 (slow).

In the second instance adding ToList() forced evaluation of query immediately (performing SQL) and that's why you got following times:

  • Load From DB time - 00:00:04.3568445, Count - 577 - SQL query on database (slow)
  • Build report time - 00:00:00.0018535 - operating on data already in memory (fast)

Interesting is the fact that your query which returns 577 items took more than 3 seconds. That can happen because of missing indexes on one of the tables.

Bear in mind that if you do not have indexes on a tables, database system needs to perform full table scan to find out all items meeting the following condition:

.Where(ta => ta.date >= startOfDay &&
             ta.date < endOfDay &&
             ta.workerID == workerID)

With growing number of items in Tasks table your query will take longer and longer.

So I would highly recommend create indexes on Tasks.date and Tasks.workerId columns. That should improve initial query time (assuming your database connection is fast i.e. you are not connecting to database deployed over the ocean).

BTW, do not create indexes on all table columns (only on the ones you use in query conditions). That may slow your insert operations down and increase database size.

Unfortunately I cannot advise much more on Do somthing time ... as you did not provide the code. But if you apply same advice, I am sure you get some improvements too.

Upvotes: 0

kazem
kazem

Reputation: 3749

when you don't use .ToList(), C# don't load your data from data base until your first need to fetch data from data base, this is because of lazy loading in entity framework.

and in every step of the inner for-each loop, your program request a query from database, and this is so slow.

but, when you use .ToList(), you run the query immediately and get all records at first, and this is slow. then, in the inner for-each loop your program have all records in memory.

excuse me for my weak flouncy in English speaking :D

Upvotes: 2

Related Questions