How do I get EF6 to generate efficient SQL containing mulitple aggregate columns?

Question

I'm trying to get the Entity Framework (v6.1.3) to generate efficient SQL for a query with multiple aggregates.

Here's a simplified example.

Table:

CREATE TABLE [dbo].[CaseAttorney](
[CaseAttorneyID] [int] IDENTITY(1,1) NOT NULL,
[CaseNumber] [varchar](30) NOT NULL,
[AttorneyID] [int] NOT NULL,
[DateAssigned] [datetime] NULL,
[DateUnassigned] [datetime] NULL,
 CONSTRAINT [PK_CaseAttorney] PRIMARY KEY CLUSTERED 
(
    [CaseAttorneyID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

C#:

using (var cx = new DATA())
{

    var startDate = DateTime.Parse("1/1/2014");
    var endDate = startDate.AddDays(1);

    cx.Database.Log = Console.WriteLine;

    var res = cx.CaseAttorneys.
        GroupBy(o => new
        {
            AttorneyID = o.AttorneyID
        }).Select(g => new
        {
            AttorneyID = g.Key.AttorneyID,
            ActiveStart = g.Sum(item => (item.DateAssigned < startDate && (item.DateUnassigned == null || item.DateUnassigned >= startDate) ? 1 : 0)),
            Assigned = g.Sum(item => (item.DateAssigned >= startDate && item.DateAssigned <= endDate) ? 1 : 0)
        }).ToArray();

}

Instead of generating a query with a single GROUP BY, I get a very inefficient query containing multiple nested tables. This happens with both COUNTand SUM:

SELECT 
[Project3].[AttorneyID] AS [AttorneyID], 
[Project3].[C1] AS [C1], 
[Project3].[C2] AS [C2]
FROM ( SELECT 
    [Project2].[AttorneyID] AS [AttorneyID], 
    [Project2].[C1] AS [C1], 
    (SELECT 
        SUM([Filter2].[A1]) AS [A1]
        FROM ( SELECT 
            CASE WHEN (([Extent3].[DateAssigned] >= @p__linq__2) AND ([Extent3].[DateAssigned] <= @p__linq__3)) THEN 1 ELSE 0 END AS [A1]
            FROM [dbo].[CaseAttorney] AS [Extent3]
            WHERE [Project2].[AttorneyID] = [Extent3].[AttorneyID]
        )  AS [Filter2]) AS [C2]
    FROM ( SELECT 
        [Distinct1].[AttorneyID] AS [AttorneyID], 
        (SELECT 
            SUM([Filter1].[A1]) AS [A1]
            FROM ( SELECT 
                CASE WHEN (([Extent2].[DateAssigned] < @p__linq__0) AND (([Extent2].[DateUnassigned] IS NULL) OR ([Extent2].[DateUnassigned] >= @p__linq__1))) THEN 1 ELSE 0 END AS [A1]
                FROM [dbo].[CaseAttorney] AS [Extent2]
                WHERE [Distinct1].[AttorneyID] = [Extent2].[AttorneyID]
            )  AS [Filter1]) AS [C1]
        FROM ( SELECT DISTINCT 
            [Extent1].[AttorneyID] AS [AttorneyID]
            FROM [dbo].[CaseAttorney] AS [Extent1]
        )  AS [Distinct1]
    )  AS [Project2]
)  AS [Project3]

The nesting in and of itself wouldn't be too bad if it didn't keep hitting the same tables over and over again. This problem gets worse the more aggregate columns are added.

I haven't found any similar questions here, so I'm sure I'm doing something wrong.

What is the correct way of getting the Entity Framework to generate an efficient projection when I want to return multiple aggregate columns?

Ivan Stoev · Accepted Answer

Count(predicate) (and actually any function involving predicates) seems to have that effect on the generated SQL query.

However, the conditional Sum (i.e. Sum(predicate ? 1 : 0)) has no such affect~~, so the following will do what you want:~~

Update: It turns out that the Sum trick is necessary, but not enough when predicates use variables like in your case. It most probably is EF bug, because playing with different GroupBy overloads doesn't help, except if you include a temporary projection including conditional expressions before doing GroupBy.

So (finally) the following query

db.CaseAttorneys.Select(item => new
{
    Item = item,
    ActiveStart = item.DateAssigned < startDate && (item.DateUnassigned == null || item.DateUnassigned >= startDate) ? 1 : 0,
    Assigned = item.DateAssigned >= startDate && item.DateAssigned <= endDate ? 1 : 0
})
.GroupBy(o => new
{
    AttorneyID = o.Item.AttorneyID
})
.Select(g => new
{
    AttorneyID = g.Key.AttorneyID,
    ActiveStart = g.Sum(item => item.ActiveStart),
    Assigned = g.Sum(item => item.Assigned)
}).ToArray();

produced the desired SQL

SELECT 
    [GroupBy1].[K1] AS [AttorneyID], 
    [GroupBy1].[A1] AS [C1], 
    [GroupBy1].[A2] AS [C2]
    FROM ( SELECT 
        [Extent1].[K1] AS [K1], 
        SUM([Extent1].[A1]) AS [A1], 
        SUM([Extent1].[A2]) AS [A2]
        FROM ( SELECT 
            [Extent1].[AttorneyID] AS [K1], 
            CASE WHEN (([Extent1].[DateAssigned] < @p__linq__0) AND (([Extent1].[DateUnassigned] IS NULL) OR ([Extent1].[DateUnassigned] >= @p__linq__1))) THEN 1 ELSE 0 END AS [A1], 
            CASE WHEN (([Extent1].[DateAssigned] >= @p__linq__2) AND ([Extent1].[DateAssigned] <= @p__linq__3)) THEN 1 ELSE 0 END AS [A2]
            FROM [dbo].[CaseAttorneys] AS [Extent1]
        )  AS [Extent1]
        GROUP BY [K1]
    )  AS [GroupBy1]

How do I get EF6 to generate efficient SQL containing mulitple aggregate columns?

Answers (1)

Related Questions