Snæbjørn
Snæbjørn

Reputation: 10792

Best way to select entries where navigation property is null

I've run into a bit of a performance problem. I got the following database model. Model

I wish to select all entries in Entity2 that doesn't have a foreign key in Entity1, in other words the navigation property must be null.

I've come up with the following LINQ query context.Entity2Set.Where(x => x.Entity1 == null); and it works, however it's slow. So I was wondering if there was a better (faster) way (other then indexing) to select the entries.

In the end I want to select the entries from Entity4, so performance is an issue.

Upvotes: 0

Views: 550

Answers (2)

Slauma
Slauma

Reputation: 177133

You are querying for the principal in the relationship (Entity2), not the dependent (Entity1). This means that the foreign key is in table Entity1 and the generated SQL query isn't just a query for a column value in table Entity2. In order to get the desired result the SQL query must join (by LEFT OUTER JOIN) the two tables.

If it were a one-to-many relationship your query is like: "Fetch all orders that have no order items." This can't be achieved by inspecting the Order table alone, because the foreign key is in table OrderItem and the query must join the two tables. Indeed in this example the query's performance could be improved by indexing the foreign key column in table OrderItem.

Unfortunely in your example you can't improve your query by indexing a column anymore because you have a one-to-one relationship (not one-to-many). Since EF's one-to-one relationships are always shared primary key associations the foreign key in table Entity1 is its primary key Id at the same time.

That means that your foreign key column is already indexed - namely by the clustered primary key index in table Entity1.

The generated SQL query is like so:

SELECT 
[Extent1].[Id] AS [Id], 
[Extent1].[OtherColumn] AS [OtherColumn]
FROM   [dbo].[Entity2] AS [Extent1]
LEFT OUTER JOIN [dbo].[Entity1] AS [Extent2] ON [Extent1].[Id] = [Extent2].[Id]
LEFT OUTER JOIN [dbo].[Entity1] AS [Extent3] ON [Extent2].[Id] = [Extent3].[Id]
WHERE [Extent3].[Id] IS NULL

As you can see the tables are joined by the indexed Id columns. The second and redundant LEFT OUTER JOIN is a flaw in Entity Framework's SQL generation that was in effect for EF < 5.0. I am not sure if it is solved in EF >= 5.0. I don't know if it really matters for the query performance because the line joins the table with itself by the same column, so I'd hope the SQL query optimizer will recognize it and ignore the line in the execution plan.

In essence: Your LINQ query is fine and you can't improve the performance, even not by adding any additional index to a column.

However, what I would suggest is to find out what is really slow. Is it indeed the SQL query, or is it translation from the LINQ query to SQL, or is it perhaps object materialization of the returned result set, etc.? The first step I would do is pasting the generated SQL query into SQL Server Management Studio and see how the performance is there. Even though you would not change the LINQ query there might be other opportunities to improve the performance.

Upvotes: 2

qujck
qujck

Reputation: 14580

You've said

without indexes

but for this problem that would be a mistake. An index will find all null references almost instantaneously.

Upvotes: 0

Related Questions