Reputation: 1459
I recently experienced a performance problem in Entity Framework querying against SQL Server 2008. I managed to fix the issue, but I don't understand why my fix worked. I'm using a collection of Guids with a .Contains() method to generate an IN clause in SQL. Here's the original code (table names changed to protect the innocent):
Guid[] values = filter.Split(',').Select<String, Guid>(d => new Guid(d)).ToArray();
returnValue = returnValue.Where(t => values.Contains(t.WorkItem.Requirement.Project.ProjectId));
This query takes ~20 seconds for execution when there are > 150 ProjectID's. By changing the location of the .Contains() I can speed things up dramatically. Here's the refactor:
Guid[] values = filter.FilterValue.Split(',').Select<String, Guid>(d => new Guid(d)).ToArray();
var projects = from p in context.DC_DEF_Project
where values.Contains(p.ProjectId)
select p;
returnValue = from t in returnValue
join p in projects on t.DC_DEF_ProjectWorkItem.DC_DEF_ProjectRequirement.ProjectId equals p.ProjectId
select t;
This code takes ~0.125 seconds on the same data set as the above query.
I'm sure there's a sane reason for this, but my curiosity is killing me. What is it?
Upvotes: 3
Views: 337
Reputation: 284
My guess would be that the first one results in sql with a bunch or ORs evaluated against the foreign key on workitems after all the joins where the second joins mach to projects by it's primary key and evaluates the 150 ids only once then joins that to the other tables.
Upvotes: 1
Reputation: 4632
As far as I understood the working of Linq it must have something to do with the deferred execution of the queries.
In your fixed version, you have two Linq expressions that will be evaluated and interpreted one after the other, but executed only once when you enumerate the results (if ever). That neatly reduces database accesses.
The first version:
returnValue.Where(t => values.Contains(t.WorkItem.Requirement.Project.ProjectId))
does not do any filtering in the first step. It will do a check if the ProjectId
of the current table row is contained in values
for every row of the table!
Upvotes: 0
Reputation: 9858
I think in the first example, you are bringing the entire dataset over to your app and then doing an in-memory linq query against it.
Upvotes: 0