SamuelWarren
SamuelWarren

Reputation: 1459

Entity Framework Two queries with same results, but one is faster

I recently experienced a performance problem in Entity Framework querying against SQL Server 2008. I managed to fix the issue, but I don't understand why my fix worked. I'm using a collection of Guids with a .Contains() method to generate an IN clause in SQL. Here's the original code (table names changed to protect the innocent):

Guid[] values = filter.Split(',').Select<String, Guid>(d => new Guid(d)).ToArray();


returnValue = returnValue.Where(t => values.Contains(t.WorkItem.Requirement.Project.ProjectId));

This query takes ~20 seconds for execution when there are > 150 ProjectID's. By changing the location of the .Contains() I can speed things up dramatically. Here's the refactor:

Guid[] values = filter.FilterValue.Split(',').Select<String, Guid>(d => new Guid(d)).ToArray();

var projects = from p in context.DC_DEF_Project
               where values.Contains(p.ProjectId)
               select p;

returnValue = from t in returnValue
              join p in projects on t.DC_DEF_ProjectWorkItem.DC_DEF_ProjectRequirement.ProjectId equals p.ProjectId
              select t;

This code takes ~0.125 seconds on the same data set as the above query.

I'm sure there's a sane reason for this, but my curiosity is killing me. What is it?

Upvotes: 3

Views: 337

Answers (3)

Bill
Bill

Reputation: 284

My guess would be that the first one results in sql with a bunch or ORs evaluated against the foreign key on workitems after all the joins where the second joins mach to projects by it's primary key and evaluates the 150 ids only once then joins that to the other tables.

Upvotes: 1

Jens H
Jens H

Reputation: 4632

As far as I understood the working of Linq it must have something to do with the deferred execution of the queries.

In your fixed version, you have two Linq expressions that will be evaluated and interpreted one after the other, but executed only once when you enumerate the results (if ever). That neatly reduces database accesses.

The first version:

returnValue.Where(t => values.Contains(t.WorkItem.Requirement.Project.ProjectId))

does not do any filtering in the first step. It will do a check if the ProjectId of the current table row is contained in values for every row of the table!

Upvotes: 0

akatakritos
akatakritos

Reputation: 9858

I think in the first example, you are bringing the entire dataset over to your app and then doing an in-memory linq query against it.

Upvotes: 0

Related Questions