Dai
Dai

Reputation: 155708

Generating efficient LEFT JOIN with COUNT in Linq to Entities

I have this schema:

Lists ( ListId, Name, DateCreated, ... )
ListItems( ListId, Text, Foo, Baz, Qux, ... )

I have an IQueryable<List> which represents another Linq query which returns some List entities.

I want to JOIN it with some aggregate data of ListItems, but this proving difficult as Linq is generating inefficient SQL - but I also want to make the query composable.

Here is something similar to the SQL I want Linq to generate:

SELECT
    *
FROM
(
    -- This part represents the IQueryable:

    SELECT
        ListId,
        Name,
        ...
    FROM
        Lists
    ORDER BY
        Lists.DateCreated
    OFFSET
        0 ROWS FETCH NEXT 25 ROWS ONLY -- Linq's .Skip(0).Take(25)
) AS ListsResult
LEFT JOIN
(
    -- This is the aggregate data query I want Linq to generate:
    SELECT
        ListId,
        COUNT(1) AS [Count],
        COUNT( CASE WHEN Foo = 'bar' THEN 1 ELSE NULL END ) AS CountFooBar,
        COUNT( CASE WHEN Baz > 5     THEN 1 ELSE NULL END ) AS CountBaz5
    FROM
        ListItems
    WHERE
        Qux IS NOT NULL
    GROUP BY
        ListId
) AS ItemsStats ON ListResults.ListId = ItemsStats.ListId

This is the Linq I have - I prefer the Extension Method syntax:

IQueryable lists = GetLists( 0, 25 );

var stats = this.dbContext.ListItems
    .Where( (ListItem li) => li.Qux != null )
    .GroupBy( (ListItem li) => li.ListId )
    .Select( grp => new
    {
        grp.Key,
        Count       = grp.Count(),
        CountFooBar = grp.Count( (ListItem li) => li.Foo == "bar" )
        CountBaz5   = grp.Count( (ListItem li) => li.Baz  > 5 )
    } )

return lists
    .Join(
        inner: stats,
        outerKeySelector: (List l) => l.ListId,
        innerKeySelector: grp => grp.Key,
        resultSelector: (list, itemStats) => new { list, itemsStats }
    )

However this generates SQL looking like this (this query shows my real table and column names, which is a bit more complicated than the schema I posted earlier:)

SELECT 
    [Project13].[C2] AS [C1], 
    [Project13].[ListId] AS [ListId], 
    [Project13].[C1] AS [C2], 
    [Project13].[C3] AS [C3], 
    [Project13].[C4] AS [C4], 
    [Project13].[C5] AS [C5], 
    [Project13].[C6] AS [C6], 
    [Project13].[C7] AS [C7]
    FROM ( SELECT 
        [Project11].[C1] AS [C1], 
        [Project11].[ListId] AS [ListId], 
        [Project11].[C2] AS [C2], 
        [Project11].[C3] AS [C3], 
        [Project11].[C4] AS [C4], 
        [Project11].[C5] AS [C5], 
        [Project11].[C6] AS [C6], 
        (SELECT 
            COUNT(1) AS [A1]
            FROM   (SELECT [Project12].[ListId] AS [ListId]
                FROM ( SELECT 
                    [Extent11].[ListId] AS [ListId], 
                    [Extent11].[Created] AS [Created]
                    FROM [dbo].[Lists] AS [Extent11]
                    WHERE ([Extent11].[TenantId] = 8) AND ([Extent11].[BlarghId] = 8)
                )  AS [Project12]
                ORDER BY [Project12].[Created] DESC
                OFFSET 0 ROWS FETCH NEXT 25 ROWS ONLY  ) AS [Limit6]
            INNER JOIN [dbo].[ListItems] AS [Extent12] ON ([Limit6].[TenantId] = [Extent12].[TenantId]) AND ([Limit6].[BlarghId] = [Extent12].[BlarghId]) AND ([Limit6].[ListId] = [Extent12].[ListId])
            WHERE (([Extent12].[Baz] > 0) OR ((LEN([Extent12].[Notes])) > 0) OR ((LEN([Extent12].[Value])) > 0)) AND ([Project11].[TenantId] = [Extent12].[TenantId]) AND ([Project11].[BlarghId] = [Extent12].[BlarghId]) AND ([Project11].[ListId] = [Extent12].[ListId]) AND ([Extent12].[RecommendationRevision] IS NOT NULL)) AS [C7]
        FROM ( SELECT 
            [Project9].[C1] AS [C1], 
            [Project9].[TenantId] AS [TenantId], 
            [Project9].[BlarghId] AS [BlarghId], 
            [Project9].[ListId] AS [ListId], 
            [Project9].[C2] AS [C2], 
            [Project9].[C3] AS [C3], 
            [Project9].[C4] AS [C4], 
            [Project9].[C5] AS [C5], 
            (SELECT 
                COUNT(1) AS [A1]
                FROM   (SELECT [Project10].[TenantId] AS [TenantId], [Project10].[BlarghId] AS [BlarghId], [Project10].[ListId] AS [ListId]
                    FROM ( SELECT 
                        [Extent9].[TenantId] AS [TenantId], 
                        [Extent9].[BlarghId] AS [BlarghId], 
                        [Extent9].[ListId] AS [ListId], 
                        [Extent9].[Created] AS [Created]
                        FROM [dbo].[Lists] AS [Extent9]
                        WHERE ([Extent9].[TenantId] = 8) AND ([Extent9].[BlarghId] = 8)
                    )  AS [Project10]
                    ORDER BY [Project10].[Created] DESC
                    OFFSET 0 ROWS FETCH NEXT 25 ROWS ONLY  ) AS [Limit5]
                INNER JOIN [dbo].[ListItems] AS [Extent10] ON ([Limit5].[TenantId] = [Extent10].[TenantId]) AND ([Limit5].[BlarghId] = [Extent10].[BlarghId]) AND ([Limit5].[ListId] = [Extent10].[ListId])
                WHERE (([Extent10].[Baz] > 0) OR ((LEN([Extent10].[Notes])) > 0) OR ((LEN([Extent10].[Value])) > 0)) AND ([Project9].[TenantId] = [Extent10].[TenantId]) AND ([Project9].[BlarghId] = [Extent10].[BlarghId]) AND ([Project9].[ListId] = [Extent10].[ListId]) AND (3 = [Extent10].[Baz])) AS [C6]
            FROM ( SELECT 
                [Project7].[C1] AS [C1], 
                [Project7].[TenantId] AS [TenantId], 
                [Project7].[BlarghId] AS [BlarghId], 
                [Project7].[ListId] AS [ListId], 
                [Project7].[C2] AS [C2], 
                [Project7].[C3] AS [C3], 
                [Project7].[C4] AS [C4], 
                (SELECT 
                    COUNT(1) AS [A1]
                    FROM   (SELECT [Project8].[TenantId] AS [TenantId], [Project8].[BlarghId] AS [BlarghId], [Project8].[ListId] AS [ListId]
                        FROM ( SELECT 
                            [Extent7].[TenantId] AS [TenantId], 
                            [Extent7].[BlarghId] AS [BlarghId], 
                            [Extent7].[ListId] AS [ListId], 
                            [Extent7].[Created] AS [Created]
                            FROM [dbo].[Lists] AS [Extent7]
                            WHERE ([Extent7].[TenantId] = 8) AND ([Extent7].[BlarghId] = 8)
                        )  AS [Project8]
                        ORDER BY [Project8].[Created] DESC
                        OFFSET 0 ROWS FETCH NEXT 25 ROWS ONLY  ) AS [Limit4]
                    INNER JOIN [dbo].[ListItems] AS [Extent8] ON ([Limit4].[TenantId] = [Extent8].[TenantId]) AND ([Limit4].[BlarghId] = [Extent8].[BlarghId]) AND ([Limit4].[ListId] = [Extent8].[ListId])
                    WHERE (([Extent8].[Baz] > 0) OR ((LEN([Extent8].[Notes])) > 0) OR ((LEN([Extent8].[Value])) > 0)) AND ([Project7].[TenantId] = [Extent8].[TenantId]) AND ([Project7].[BlarghId] = [Extent8].[BlarghId]) AND ([Project7].[ListId] = [Extent8].[ListId]) AND (2 = [Extent8].[Baz])) AS [C5]
                FROM ( SELECT 
                    [Project5].[C1] AS [C1], 
                    [Project5].[TenantId] AS [TenantId], 
                    [Project5].[BlarghId] AS [BlarghId], 
                    [Project5].[ListId] AS [ListId], 
                    [Project5].[C2] AS [C2], 
                    [Project5].[C3] AS [C3], 
                    (SELECT 
                        COUNT(1) AS [A1]
                        FROM   (SELECT [Project6].[TenantId] AS [TenantId], [Project6].[BlarghId] AS [BlarghId], [Project6].[ListId] AS [ListId]
                            FROM ( SELECT 
                                [Extent5].[TenantId] AS [TenantId], 
                                [Extent5].[BlarghId] AS [BlarghId], 
                                [Extent5].[ListId] AS [ListId], 
                                [Extent5].[Created] AS [Created]
                                FROM [dbo].[Lists] AS [Extent5]
                                WHERE ([Extent5].[TenantId] = 8) AND ([Extent5].[BlarghId] = 8)
                            )  AS [Project6]
                            ORDER BY [Project6].[Created] DESC
                            OFFSET 0 ROWS FETCH NEXT 25 ROWS ONLY  ) AS [Limit3]
                        INNER JOIN [dbo].[ListItems] AS [Extent6] ON ([Limit3].[TenantId] = [Extent6].[TenantId]) AND ([Limit3].[BlarghId] = [Extent6].[BlarghId]) AND ([Limit3].[ListId] = [Extent6].[ListId])
                        WHERE (([Extent6].[Baz] > 0) OR ((LEN([Extent6].[Notes])) > 0) OR ((LEN([Extent6].[Value])) > 0)) AND ([Project5].[TenantId] = [Extent6].[TenantId]) AND ([Project5].[BlarghId] = [Extent6].[BlarghId]) AND ([Project5].[ListId] = [Extent6].[ListId]) AND (1 = [Extent6].[Baz])) AS [C4]
                    FROM ( SELECT 
                        [Project3].[C1] AS [C1], 
                        [Project3].[TenantId] AS [TenantId], 
                        [Project3].[BlarghId] AS [BlarghId], 
                        [Project3].[ListId] AS [ListId], 
                        [Project3].[C2] AS [C2], 
                        (SELECT 
                            COUNT(1) AS [A1]
                            FROM   (SELECT [Project4].[TenantId] AS [TenantId], [Project4].[BlarghId] AS [BlarghId], [Project4].[ListId] AS [ListId]
                                FROM ( SELECT 
                                    [Extent3].[TenantId] AS [TenantId], 
                                    [Extent3].[BlarghId] AS [BlarghId], 
                                    [Extent3].[ListId] AS [ListId], 
                                    [Extent3].[Created] AS [Created]
                                    FROM [dbo].[Lists] AS [Extent3]
                                    WHERE ([Extent3].[TenantId] = 8) AND ([Extent3].[BlarghId] = 8)
                                )  AS [Project4]
                                ORDER BY [Project4].[Created] DESC
                                OFFSET 0 ROWS FETCH NEXT 25 ROWS ONLY  ) AS [Limit2]
                            INNER JOIN [dbo].[ListItems] AS [Extent4] ON ([Limit2].[TenantId] = [Extent4].[TenantId]) AND ([Limit2].[BlarghId] = [Extent4].[BlarghId]) AND ([Limit2].[ListId] = [Extent4].[ListId])
                            WHERE (([Extent4].[Baz] > 0) OR ((LEN([Extent4].[Notes])) > 0) OR ((LEN([Extent4].[Value])) > 0)) AND ([Project3].[TenantId] = [Extent4].[TenantId]) AND ([Project3].[BlarghId] = [Extent4].[BlarghId]) AND ([Project3].[ListId] = [Extent4].[ListId]) AND ([Extent4].[Foo] = 1)) AS [C3]
                        FROM ( SELECT 
                            [GroupBy1].[A1] AS [C1], 
                            [GroupBy1].[K1] AS [TenantId], 
                            [GroupBy1].[K2] AS [BlarghId], 
                            [GroupBy1].[K3] AS [ListId], 
                            [GroupBy1].[K4] AS [C2]
                            FROM ( SELECT 
                                [Project2].[K1] AS [K1], 
                                [Project2].[K2] AS [K2], 
                                [Project2].[K3] AS [K3], 
                                [Project2].[K4] AS [K4], 
                                COUNT([Project2].[A1]) AS [A1]
                                FROM ( SELECT 
                                    [Project2].[TenantId] AS [K1], 
                                    [Project2].[BlarghId] AS [K2], 
                                    [Project2].[ListId] AS [K3], 
                                    1 AS [K4], 
                                    1 AS [A1]
                                    FROM ( SELECT 
                                        [Extent2].[ListId] AS [ListId]
                                        FROM   (SELECT [Project1].[ListId] AS [ListId]
                                            FROM ( SELECT 
                                                [Extent1].[ListId] AS [ListId], 
                                                [Extent1].[Created] AS [Created]
                                                FROM [dbo].[Lists] AS [Extent1]
                                                WHERE ([Extent1].[TenantId] = 8) AND ([Extent1].[BlarghId] = 8)
                                            )  AS [Project1]
                                            ORDER BY [Project1].[Created] DESC
                                            OFFSET 0 ROWS FETCH NEXT 25 ROWS ONLY  ) AS [Limit1]
                                        INNER JOIN [dbo].[ListItems] AS [Extent2] ON (([Limit1].[ListId] = [Extent2].[ListId])
                                        WHERE ([Extent2].[Baz] > 0) OR ((LEN([Extent2].[Notes])) > 0) OR ((LEN([Extent2].[Value])) > 0)
                                    )  AS [Project2]
                                )  AS [Project2]
                                GROUP BY [K1], [K2], [K3], [K4]
                            )  AS [GroupBy1]
                        )  AS [Project3]
                    )  AS [Project5]
                )  AS [Project7]
            )  AS [Project9]
        )  AS [Project11]
    )  AS [Project13]

It doesn't compose the COUNT() statements together at all, and it moves the COUNT predicates to separate WHERE clauses. Also note the repeated paged subqueries (where OFFSET 0 ROW FETCH NEXT 25 is used) whereas my hand-written query only executes it once.

Upvotes: 2

Views: 481

Answers (2)

Ivan Stoev
Ivan Stoev

Reputation: 205899

In my experience EF generates such queries when you use aggregate functions that apply filtering like Count(predicate) in your case. You'll get much better SQL query if you replace Count(condition) construct with conditional sum (Sum(condition ? 1 : 0)) like this:

var stats = db.ListItems
    .Where(li => li.Qux != null)
    .GroupBy(li => li.ListId)
    .Select(grp => new
    {
        grp.Key,
        Count = grp.Count(),
        CountFooBar = grp.Sum(li => li.Foo == "bar" ? 1 : 0),
        CountBaz5 = grp.Sum(li => li.Baz > 5 ? 1 : 0)
    });

The other parts stay the same. Before the mod I was getting similar SQL to the posted, and here is what I'm getting after this little mod:

SELECT 
    [Limit1].[ListId] AS [ListId], 
    [Limit1].[Name] AS [Name], 
    [Limit1].[DateCreated] AS [DateCreated], 
    [GroupBy1].[K1] AS [ListId1], 
    [GroupBy1].[A1] AS [C1], 
    [GroupBy1].[A2] AS [C2], 
    [GroupBy1].[A3] AS [C3]
    FROM   (SELECT [Extent1].[ListId] AS [ListId], [Extent1].[Name] AS [Name], [Extent1].[DateCreated] AS [DateCreated]
        FROM [dbo].[List] AS [Extent1]
        ORDER BY [Extent1].[DateCreated] ASC
        OFFSET 0 ROWS FETCH NEXT 25 ROWS ONLY  ) AS [Limit1]
    INNER JOIN  (SELECT 
        [Filter1].[K1] AS [K1], 
        COUNT([Filter1].[A1]) AS [A1], 
        SUM([Filter1].[A2]) AS [A2], 
        SUM([Filter1].[A3]) AS [A3]
        FROM ( SELECT 
            [Extent2].[ListId] AS [K1], 
            1 AS [A1], 
            CASE WHEN (N'bar' = [Extent2].[Foo]) THEN 1 ELSE 0 END AS [A2], 
            CASE WHEN ([Extent2].[Baz] > 5) THEN 1 ELSE 0 END AS [A3]
            FROM [dbo].[ListItem] AS [Extent2]
            WHERE [Extent2].[Qux] IS NOT NULL
        )  AS [Filter1]
        GROUP BY [K1] ) AS [GroupBy1] ON [Limit1].[ListId] = [GroupBy1].[K1]

UPDATE: The above is just the LINQ equivalent of your manual SQL query (when you adjust the join code to produce LEFT OUTER JOIN). However, taking into account the master data paging, OUTER APPLY SQL query might perform better. You can let LINQ to Entities generate such type of query like this:

var listsWithItemsStats = lists
    .SelectMany(l => db.ListItems
        .Where(li => li.ListId == l.ListId && li.Qux != null)
        .GroupBy(li => li.ListId)
        .Select(grp => new
        {
            grp.Key,
            Count = grp.Count(),
            CountFooBar = grp.Sum(li => li.Foo == "bar" ? 1 : 0),
            CountBaz5 = grp.Sum(li => li.Baz > 5 ? 1 : 0)
        })
        .DefaultIfEmpty(),
        (list, itemsStats) => new { list, itemsStats });

Upvotes: 1

Dai
Dai

Reputation: 155708

Here's a semi-workaround I have:

I realised the best short-term solution is to have the SQL in the database (as a UDF FUNCTION or VIEW), this would mean that some data code would therefore have to be in the database (rather than using the DB as a "dumb store").

I first created a table-valued UDF which would accept a table-valued parameter, reasoning that would allow for composition, at the cost of needing to generate the input parameter table (an array of ListId values from the paged query). However in doing so, I realised that Linq-to-Entities (version 6) does not support table-valued parameters in Function Imports yet.

I then reasoned a better approach would be to move the COUNT operation to a VIEW (which represents one half of the LEFT JOIN I had in my hand-written query), and then I can get Linq to JOIN that against the existing IQueryable, thus retaining composability and generating an efficient runtime query (indeed, when I run it, the query takes 34ms to execute according to SQL Server Profiler, whereas the old Linq-generated inefficient query took 830ms).

Here's what I used:

CREATE VIEW ListItemStatistics AS

SELECT
    ListId,

    COUNT(*) AS [CountAll],
    COUNT( CASE WHEN ... ) AS Count...

FROM
    ListItems
WHERE
    Foo = 'bar'
GROUP BY
     ListId

And then from within Linq:

IQueryable lists = GetListsQuery( 0, 25 );

var listsWithItemsStats = lists.
    .Join(
        inner: this.dbContext.ListItemStatistics,
        outerKeySelector: list => list.ListId,
        innerKeySelector: row => row.ListId,
        resultSelector: (list,row) => new { list, row }
    );

However because this does use database-side logic (in the VIEW) it is not ideal.

Upvotes: 1

Related Questions