Reputation: 31
I am building program that use DataBase with 3 tables(Worker, Task, TaskStep) and i have a method that get date and build report for specific worker of the task and there steps for the specific day.
The data base structure is as follow:
MySQL 5.2
Worker
table columns:
workerID(VARCHAR(45)),
name(VARCHAR(45)),
age(int),
...
Tasks
table columns:
TaskID(VARCHAR(45)),
description(VARCHAR(45)),
date(DATE),
...
TaskSteps
table columns:
TaskStepID(VARCHAR(45)),
description(VARCHAR(45)),
date(DATE),
...
No indexing on any table
The problem is thats it is very very slow!! (~ 20 seconds)
Here is the code:
using WorkerDailyReport = Dictionary<task, IEnumerable<taskStep>>;
private void Buildreport(DateTime date)
{
var report = new WorkerDailyReport();
// Load from DB
var sw = new Stopwatch();
sw.Start();
var startOfDay = date.Date;
var endOfDay = startOfDay.AddDays(1);
var db = new WorkEntities();
const string workerID = "80900855";
IEnumerable<task> _tasks = db.task
.Where(ta => ta.date >= startOfDay &&
ta.date < endOfDay &&
ta.workerID == workerID)
.ToList();
sw.Stop();
Console.WriteLine("Load From DB time - " + sw.Elapsed +
", Count - " + _tasks.Count());
// Build the report
sw.Restart();
foreach (var t in _tasks)
{
var ts = db.taskStep.Where(s => s.taskID == task.taskID);
report.Add(t, ts);
}
sw.Stop();
Console.WriteLine("Build report time - " + sw.Elapsed);
// Do somthing with the report
foreach (var t in report)
{
sw.Restart();
foreach (var subNode in t.Value)
{
// Do somthing..
}
Console.WriteLine("Do somthing time - " + sw.Elapsed +
", Count - " + t.Value.Count());
}
}
As u can see i put StopWatch in each part to check what take so long and this is the results:
1)
If i run the code as above:
Console:
Load From DB time - 00:00:00.0013774, Count - 577
Build report time - 00:00:03.6305722
Do somthing time - 00:00:07.7573754, Count - 21
Do somthing time - 00:00:08.2811928, Count - 11
Do somthing time - 00:00:07.8715531, Count - 14
Do somthing time - 00:00:08.0430597, Count - 0
Do somthing time - 00:00:07.7867790, Count - 9
Do somthing time - 00:00:07.3485209, Count - 39
.........
the inner foreach run takes about 7-9!! Sec to run over no more then 40 record.
2)
If i change only one thing, Add .ToList() after the first query when i load the worker tasks from the Data Base it changes everithing.
Console:
Load From DB time - 00:00:04.3568445, Count - 577
Build report time - 00:00:00.0018535
Do somthing time - 00:00:00.0191099, Count - 21
Do somthing time - 00:00:00.0144895, Count - 11
Do somthing time - 00:00:00.0150208, Count - 14
Do somthing time - 00:00:00.0179021, Count - 0
Do somthing time - 00:00:00.0151372, Count - 9
Do somthing time - 00:00:00.0155703, Count - 39
.........
Now the load from DataBase takes lot more time, 4+ sec. But the Built report time is about ~1ms And each inner foreach takes ~10ms
The first way is imposible(577 * ~8 seconds) and the seconde option is also very slow and i cant see y.
Any idea what happening here?
1) Why the ToList()
so slow ?
2) Why without the ToList()
, The inner foreach
and the Build report is slowing?
How can i make it faster?
thnx.
Upvotes: 2
Views: 4558
Reputation: 11964
To improve performance, you should use one query to get data from all table:
var _joinedTasks = db.task.Where(ta => ta.date >= startOfDay &&
ta.date < endOfDay &&
ta.workerID == workerID)
.Join(db.taskStep, t => t.taskID, ts=>ts.taskID, (t, ts) => new {t, ts})
.GroupBy(g => g.t, v=>v.ts).AsEnumerable();
Then you can add it to dictionary:
var report = _joinedTasks.ToDictionary(g=>g.Key);
And use this report as you wish.
Upvotes: 0
Reputation: 26849
LINQ ToList()
always evaluates the sequence immediately - in your case SQL query on database.
In the first instance you got Load From DB time - 00:00:00.0013774, Count - 577
- that was quick as you did not run SQL query. However, the query was ran a bit later - that's why you got Build report time - 00:00:03.6305722
(slow).
In the second instance adding ToList()
forced evaluation of query immediately (performing SQL) and that's why you got following times:
Load From DB time - 00:00:04.3568445, Count - 577
- SQL query on database (slow)Build report time - 00:00:00.0018535
- operating on data already in memory (fast)Interesting is the fact that your query which returns 577 items took more than 3 seconds. That can happen because of missing indexes on one of the tables.
Bear in mind that if you do not have indexes on a tables, database system needs to perform full table scan to find out all items meeting the following condition:
.Where(ta => ta.date >= startOfDay &&
ta.date < endOfDay &&
ta.workerID == workerID)
With growing number of items in Tasks
table your query will take longer and longer.
So I would highly recommend create indexes on Tasks.date
and Tasks.workerId
columns. That should improve initial query time (assuming your database connection is fast i.e. you are not connecting to database deployed over the ocean).
BTW, do not create indexes on all table columns (only on the ones you use in query conditions). That may slow your insert operations down and increase database size.
Unfortunately I cannot advise much more on Do somthing time ...
as you did not provide the code. But if you apply same advice, I am sure you get some improvements too.
Upvotes: 0
Reputation: 3749
when you don't use .ToList(), C# don't load your data from data base until your first need to fetch data from data base, this is because of lazy loading in entity framework.
and in every step of the inner for-each loop, your program request a query from database, and this is so slow.
but, when you use .ToList(), you run the query immediately and get all records at first, and this is slow. then, in the inner for-each loop your program have all records in memory.
excuse me for my weak flouncy in English speaking :D
Upvotes: 2