Reputation: 7223
I am using Entity Framework in a C# application and I am using Lazy Loading. We noticed one query has an extremely high impact on our CPU, which merely calculates a sum. When debugging the query generated by Entity Framework it creates a INNER JOIN (SELECT ...
which is not performant. When I manually change the query to a proper JOIN the query time goes from 1.3 sec to 0.03 sec.
Let me illustrate it with a simplified version of my code.
public decimal GetPortfolioValue(Guid portfolioId)
{
var value = DbContext.Portfolios
.Where( x => x.Id.Equals(portfolioId) )
.SelectMany( p => p.Items
.Where( i => i.Status == ItemStatusConstants.Subscribed
&& _activeStatuses.Contains( i.Category.Status ) )
)
.Select( i => i.Amount )
.DefaultIfEmpty(0)
.Sum();
return value;
}
This generates a query that selects the sum but does an inner join on a SELECT of two tables joined together. I created a pastebin here for the generated query not to pollute this question but a shortened version would be:
SELECT ...
FROM `portfolios` AS `Extent1`
INNER JOIN (SELECT
`Extent2`.*,
`Extent3`.*
FROM `items` AS `Extent2`
INNER JOIN `categories` AS `Extent3` ON `Extent3`.`id` =
`Extent2`.`category_id`) AS `Join1`
ON `Extent1`.`id` = `Join1`.`portfolio_id`
AND ((`Join1`.`status` = @gp1)
AND (`Join1`.`STATUS1` IN (@gp2, @gp3, @gp4, @gp5, @gp6)))
WHERE ...
The query I'd expect it to generate (and which takes 0.03 sec instead of 1.3 sec) would be something like
SELECT ...
FROM `portfolios` AS `Extent1`
INNER JOIN `items` AS `Extent2` ON `Extent2`.`portfolio_id` = `Extent1`.`id`
INNER JOIN `categories` AS `Extent3` ON `Extent3`.`id` = `Extent2`.`category_id`
AND ((`Extent2`.`status` = @gp1)
AND (`Extent3`.`status` IN (@gp2, @gp3, @gp4, @gp5, @gp6)))
WHERE ...
I suspect it's due to the .SelectMany
but I don't see how I should rewrite the LINQ query to make it more efficient. As for the Entities, the linking properties are virtual and have a foreign key configured:
public class Portfolio
{
public Guid Id { get; set; }
public virtual ICollection<Item> Items { get; set; }
}
public class Item
{
public Guid Id { get; set; }
public Guid PortfolioId { get; set; }
public Guid CategoryId { get; set; }
public decimal Amount { get; set; }
public string Status { get; set; }
public virtual Portfolio Portfolio { get; set; }
public virtual Category Category { get; set; }
}
public class Category
{
public Guid Id { get; set; }
public string Status { get; set; }
public virtual ICollection<Item> Items { get; set; }
}
Any help would be greatly appreciated!
Upvotes: 4
Views: 1196
Reputation: 457
As you dont need anything from Portfolio, just filter by PortfolioId, you could query directly PortfolioItems. Assuming your DbContext has a DbSet with all items in all portfolios, could be something like this:
var value = DbContext.PortfolioItems
.Where(i => i.PortfolioId == portfolioId && i.Status == ItemStatusConstants.Subscribed && _activeStatuses.Contains(i.Category.Status))
.Sum(i=>i.Amount);
I believe you don't need the DefaultIfEmpty nor the select if you use directly the appropiate Queryable.Sum overload.
EDITED: Tried two different LINQ queries without exposing a DbSet.
The first query is basically the same as yours:
var value2 = dbContext.Portfolios
.Where(p => p.Id == portfolioId)
.SelectMany(p => p.Items)
.Where(i => i.Status == "A" && _activeStatuses.Contains(i.Category.Status))
.Select(i=>i.Amount)
.DefaultIfEmpty()
.Sum();
Profiled the query in SQL Server (don't have MySql at hand) and produces an ugly sentence (parameters replaced and quotes unescaped for testing):
SELECT [GroupBy1].[a1] AS [C1]
FROM (SELECT Sum([Join2].[a1_0]) AS [A1]
FROM (SELECT CASE
WHEN ( [Project1].[c1] IS NULL ) THEN Cast(
0 AS DECIMAL(18))
ELSE [Project1].[amount]
END AS [A1_0]
FROM (SELECT 1 AS X) AS [SingleRowTable1]
LEFT OUTER JOIN
(SELECT [Extent1].[amount] AS [Amount],
Cast(1 AS TINYINT) AS [C1]
FROM [dbo].[items] AS [Extent1]
INNER JOIN [dbo].[categories] AS
[Extent2]
ON [Extent1].[categoryid] =
[Extent2].[id]
WHERE ( N'A' = [Extent1].[status] )
AND ( [Extent1].[portfolioid] =
'E2CC0CC2-066F-45C9-9D48-543D92C4C92E' )
AND ( [Extent2].[status] IN ( N'A', N'B', N'C' )
)
AND ( [Extent2].[status] IS NOT NULL )) AS
[Project1]
ON 1 = 1) AS [Join2]) AS [GroupBy1]
If we remove the "Select" and "DefaultIfEmpty" methods, and rewrite the query as this:
var value = dbContext.Portfolios
.Where(p => p.Id == portfolioId)
.SelectMany(p => p.Items)
.Where(i => i.Status == "A" && _activeStatuses.Contains(i.Category.Status))
.Sum(i => i.Amount);
The generated sentence is much cleaner:
SELECT [GroupBy1].[a1] AS [C1]
FROM (SELECT Sum([Extent1].[amount]) AS [A1]
FROM [dbo].[items] AS [Extent1]
INNER JOIN [dbo].[categories] AS [Extent2]
ON [Extent1].[categoryid] = [Extent2].[id]
WHERE ( N'A' = [Extent1].[status] )
AND ( [Extent1].[portfolioid] =
'E2CC0CC2-066F-45C9-9D48-543D92C4C92E' )
AND ( [Extent2].[status] IN ( N'A', N'B', N'C' ) )
AND ( [Extent2].[status] IS NOT NULL )) AS [GroupBy1]
Conclussion: We cannot rely on LINQ provider to create optimized queries. The linq query must be analyzed and optimized even before thinking in SQL sentence generated.
Upvotes: 5