Reputation: 787
I'm trying to get the Entity Framework (v6.1.3) to generate efficient SQL for a query with multiple aggregates.
Here's a simplified example.
Table:
CREATE TABLE [dbo].[CaseAttorney](
[CaseAttorneyID] [int] IDENTITY(1,1) NOT NULL,
[CaseNumber] [varchar](30) NOT NULL,
[AttorneyID] [int] NOT NULL,
[DateAssigned] [datetime] NULL,
[DateUnassigned] [datetime] NULL,
CONSTRAINT [PK_CaseAttorney] PRIMARY KEY CLUSTERED
(
[CaseAttorneyID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
C#:
using (var cx = new DATA())
{
var startDate = DateTime.Parse("1/1/2014");
var endDate = startDate.AddDays(1);
cx.Database.Log = Console.WriteLine;
var res = cx.CaseAttorneys.
GroupBy(o => new
{
AttorneyID = o.AttorneyID
}).Select(g => new
{
AttorneyID = g.Key.AttorneyID,
ActiveStart = g.Sum(item => (item.DateAssigned < startDate && (item.DateUnassigned == null || item.DateUnassigned >= startDate) ? 1 : 0)),
Assigned = g.Sum(item => (item.DateAssigned >= startDate && item.DateAssigned <= endDate) ? 1 : 0)
}).ToArray();
}
Instead of generating a query with a single GROUP BY
, I get a very inefficient query containing multiple nested tables. This happens with both COUNT
and SUM
:
SELECT
[Project3].[AttorneyID] AS [AttorneyID],
[Project3].[C1] AS [C1],
[Project3].[C2] AS [C2]
FROM ( SELECT
[Project2].[AttorneyID] AS [AttorneyID],
[Project2].[C1] AS [C1],
(SELECT
SUM([Filter2].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (([Extent3].[DateAssigned] >= @p__linq__2) AND ([Extent3].[DateAssigned] <= @p__linq__3)) THEN 1 ELSE 0 END AS [A1]
FROM [dbo].[CaseAttorney] AS [Extent3]
WHERE [Project2].[AttorneyID] = [Extent3].[AttorneyID]
) AS [Filter2]) AS [C2]
FROM ( SELECT
[Distinct1].[AttorneyID] AS [AttorneyID],
(SELECT
SUM([Filter1].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (([Extent2].[DateAssigned] < @p__linq__0) AND (([Extent2].[DateUnassigned] IS NULL) OR ([Extent2].[DateUnassigned] >= @p__linq__1))) THEN 1 ELSE 0 END AS [A1]
FROM [dbo].[CaseAttorney] AS [Extent2]
WHERE [Distinct1].[AttorneyID] = [Extent2].[AttorneyID]
) AS [Filter1]) AS [C1]
FROM ( SELECT DISTINCT
[Extent1].[AttorneyID] AS [AttorneyID]
FROM [dbo].[CaseAttorney] AS [Extent1]
) AS [Distinct1]
) AS [Project2]
) AS [Project3]
The nesting in and of itself wouldn't be too bad if it didn't keep hitting the same tables over and over again. This problem gets worse the more aggregate columns are added.
I haven't found any similar questions here, so I'm sure I'm doing something wrong.
What is the correct way of getting the Entity Framework to generate an efficient projection when I want to return multiple aggregate columns?
Upvotes: 1
Views: 535
Reputation: 205739
Count(predicate)
(and actually any function involving predicates) seems to have that effect on the generated SQL query.
However, the conditional Sum
(i.e. Sum(predicate ? 1 : 0)
) has no such affect, so the following will do what you want:
Update: It turns out that the Sum
trick is necessary, but not enough when predicates use variables like in your case. It most probably is EF bug, because playing with different GroupBy
overloads doesn't help, except if you include a temporary projection including conditional expressions before doing GroupBy
.
So (finally) the following query
db.CaseAttorneys.Select(item => new
{
Item = item,
ActiveStart = item.DateAssigned < startDate && (item.DateUnassigned == null || item.DateUnassigned >= startDate) ? 1 : 0,
Assigned = item.DateAssigned >= startDate && item.DateAssigned <= endDate ? 1 : 0
})
.GroupBy(o => new
{
AttorneyID = o.Item.AttorneyID
})
.Select(g => new
{
AttorneyID = g.Key.AttorneyID,
ActiveStart = g.Sum(item => item.ActiveStart),
Assigned = g.Sum(item => item.Assigned)
}).ToArray();
produced the desired SQL
SELECT
[GroupBy1].[K1] AS [AttorneyID],
[GroupBy1].[A1] AS [C1],
[GroupBy1].[A2] AS [C2]
FROM ( SELECT
[Extent1].[K1] AS [K1],
SUM([Extent1].[A1]) AS [A1],
SUM([Extent1].[A2]) AS [A2]
FROM ( SELECT
[Extent1].[AttorneyID] AS [K1],
CASE WHEN (([Extent1].[DateAssigned] < @p__linq__0) AND (([Extent1].[DateUnassigned] IS NULL) OR ([Extent1].[DateUnassigned] >= @p__linq__1))) THEN 1 ELSE 0 END AS [A1],
CASE WHEN (([Extent1].[DateAssigned] >= @p__linq__2) AND ([Extent1].[DateAssigned] <= @p__linq__3)) THEN 1 ELSE 0 END AS [A2]
FROM [dbo].[CaseAttorneys] AS [Extent1]
) AS [Extent1]
GROUP BY [K1]
) AS [GroupBy1]
Upvotes: 1