Snæbjørn
Snæbjørn

Reputation: 10792

Optimize/rewrite LINQ query with GROUP BY and COUNT

I'm trying to get a count of unique Foos and Bars grouped by Name, on the following data set.

Id  |   IsActive    |   Name    |   Foo     |   Bar
1   |       1       |   A       |   11      |   null
2   |       1       |   A       |   11      |   null
3   |       1       |   A       |   null    |   123
4   |       1       |   B       |   null    |   321

I expect the result on the above data to be:

Expected:
A = 2;
B = 1;

I tried to group by Name,Foo,Bar and then group by Name again with a count to get the "row" count. But that didn't give me the correct result. (or the ToDictionary threw a duplicate key, I played around with this a lot so can't quite remember)

db.MyEntity
    .Where(x => x.IsActive)
    .GroupBy(x => new { x.Name, x.Foo, x.Bar })
    .GroupBy(x => new { x.Key.Name, Count = x.Count() })
    .ToDictionary(x => x.Key, x => x.Count);

So I came up with this LINQ query. But it's rather slow.

db.MyEntity
    .Where(x => x.IsActive)
    .GroupBy(x => x.Name)
    .ToDictionary(x => x.Key,
        x =>
            x.Where(y => y.Foo != null).Select(y => y.Foo).Distinct().Count() +
            x.Where(y => y.Bar != null).Select(y => y.Bar).Distinct().Count());

How can I optimize it?

Here's the entity for refernece

public class MyEntity
{
    public int Id { get; set; }
    public bool IsActive { get; set; }
    public string Name { get; set; }
    public int? Foo { get; set; }
    public int? Bar { get; set; }
}

Edit

I also tried this query

db.MyEntity
    .Where(x => x.IsActive)
    .GroupBy(x => new { x.Name, x.Foo, x.Bar })
    .GroupBy(x => x.Key.Name)
    .ToDictionary(x => x.Key, x => x.Count());

But that threw a timeout exception :(

Upvotes: 5

Views: 2816

Answers (4)

D Stanley
D Stanley

Reputation: 152566

I think you can just modify your initial query slightly to get what you want:

db.MyEntity
    .Where(x => x.IsActive)
    .GroupBy(x => new { x.Name, x.Foo, x.Bar })
    .GroupBy(x => x.Key.Name)
    .ToDictionary(x => x.Key, x => x.Count());

When you add Count() to the second grouping you are counting the duplicate values for the three-part key. You only want to count the distinct values for each three-part key, so you count after grouping by Name.

Upvotes: 0

Giorgi Nakeuri
Giorgi Nakeuri

Reputation: 35780

Your aim is to produce the following query:

select Name, count(distinct Foo) + count(distinct Bar)
from myEntity
where IsActive = 1
group by Name

This is the minimal query to get what you want. But LINQ seems to overcomplicate everything as much as possible :)

Your aim is to do as much at database level as possible. Now your query is translated to:

SELECT 
    [Project2].[C1] AS [C1], 
    [Project2].[Name] AS [Name], 
    [Project2].[C2] AS [C2], 
    [Project2].[id] AS [id], 
    [Project2].[IsActive] AS [IsActive], 
    [Project2].[Name1] AS [Name1], 
    [Project2].[Foo] AS [Foo], 
    [Project2].[Bar] AS [Bar]
    FROM ( SELECT 
        [Distinct1].[Name] AS [Name], 
        1 AS [C1], 
        [Extent2].[id] AS [id], 
        [Extent2].[IsActive] AS [IsActive], 
        [Extent2].[Name] AS [Name1], 
        [Extent2].[Foo] AS [Foo], 
        [Extent2].[Bar] AS [Bar], 
        CASE WHEN ([Extent2].[id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
        FROM   (SELECT DISTINCT 
            [Extent1].[Name] AS [Name]
            FROM [dbo].[SomeTable] AS [Extent1]
            WHERE [Extent1].[IsActive] = 1 ) AS [Distinct1]
        LEFT OUTER JOIN [dbo].[SomeTable] AS [Extent2] ON ([Extent2].[IsActive] = 1) AND ([Distinct1].[Name] = [Extent2].[Name])
    )  AS [Project2]
    ORDER BY [Project2].[Name] ASC, [Project2].[C2] ASC

It selects everything from database and performs grouping at application layer, that is inefficient.

The query of @Servy:

var activeItems = db.MyEntity.Where(x => x.IsActive);

var query = activeItems.Select(x => new { Name, Value = x.Foo}).Distinct()
.Concat(activeItems.Select(x => new { Name, Value = x.Bar}).Distinct())        
.Where(x => x != null)
.GroupBy(pair => pair.Name)
.Select(group => new { group.Key, Count = Group.Count()})
.ToDictionary(pair => pair.Key, pair => pair.Count);

is translated to:

SELECT 
1 AS [C1], 
[GroupBy1].[K1] AS [C2], 
[GroupBy1].[A1] AS [C3]
FROM ( SELECT 
    [UnionAll1].[Name] AS [K1], 
    COUNT(1) AS [A1]
    FROM  (SELECT 
        [Distinct1].[Name] AS [Name]
        FROM ( SELECT DISTINCT 
            [Extent1].[Name] AS [Name], 
            [Extent1].[Foo] AS [Foo]
            FROM [dbo].[SomeTable] AS [Extent1]
            WHERE ([Extent1].[IsActive] = 1) AND ([Extent1].[Foo] IS NOT NULL)
        )  AS [Distinct1]
    UNION ALL
        SELECT 
        [Distinct2].[Name] AS [Name]
        FROM ( SELECT DISTINCT 
            [Extent2].[Name] AS [Name], 
            [Extent2].[Bar] AS [Bar]
            FROM [dbo].[SomeTable] AS [Extent2]
            WHERE ([Extent2].[IsActive] = 1) AND ([Extent2].[Bar] IS NOT NULL)
        )  AS [Distinct2]) AS [UnionAll1]
    GROUP BY [UnionAll1].[Name]
)  AS [GroupBy1]

It is much better.

I have tried the following:

var activeItems = (from o in db.SomeTables
                   where o.IsActive
                   group o by o.Name into gr
                   select new { gr.Key, cc = gr.Select(c => c.Foo).Distinct().Count(c => c != null) + 
                                             gr.Select(c => c.Bar).Distinct().Count(c => c != null) }).ToDictionary(c => c.Key);

This is translated to:

SELECT 
1 AS [C1], 
[Project5].[Name] AS [Name], 
[Project5].[C1] + [Project5].[C2] AS [C2]
FROM ( SELECT 
    [Project3].[Name] AS [Name], 
    [Project3].[C1] AS [C1], 
    (SELECT 
        COUNT(1) AS [A1]
        FROM ( SELECT DISTINCT 
            [Extent3].[Bar] AS [Bar]
            FROM [dbo].[SomeTable] AS [Extent3]
            WHERE ([Extent3].[IsActive] = 1) AND ([Project3].[Name] = [Extent3].[Name]) AND ([Extent3].[Bar] IS NOT NULL)
        )  AS [Distinct3]) AS [C2]
    FROM ( SELECT 
        [Distinct1].[Name] AS [Name], 
        (SELECT 
            COUNT(1) AS [A1]
            FROM ( SELECT DISTINCT 
                [Extent2].[Foo] AS [Foo]
                FROM [dbo].[SomeTable] AS [Extent2]
                WHERE ([Extent2].[IsActive] = 1) AND ([Distinct1].[Name] = [Extent2].[Name]) AND ([Extent2].[Foo] IS NOT NULL)
            )  AS [Distinct2]) AS [C1]
        FROM ( SELECT DISTINCT 
            [Extent1].[Name] AS [Name]
            FROM [dbo].[SomeTable] AS [Extent1]
            WHERE [Extent1].[IsActive] = 1
        )  AS [Distinct1]
    )  AS [Project3]
)  AS [Project5]

Much the same but without unions as in second version.

Conclusion:

I would create a view and import it in model if table is quite large and performance is crucial. Otherwise stick on 3rd version or 2rd version of @Servy. Performance should be tested of course.

Upvotes: 1

Servy
Servy

Reputation: 203835

The query is extremely inefficient because you're doing much of the work (everything involved in building the dictionary) on the client side, without being able to use the database to do your projections. This is a problem both because the database (especially if these values are indexed) can do this work faster than the client, and also because doing the projections on the database involves much less data being sent over the network.

So simply do your projections before you group the data.

var activeItems = db.MyEntity.Where(x => x.IsActive);

var query = activeItems.Select(x => new { Name, Value = x.Foo}).Distinct()
    .Concat(activeItems.Select(x => new { Name, Value = x.Bar}).Distinct())        
    .Where(x => x != null)
    .GroupBy(pair => pair.Name)
    .Select(group => new { group.Key, Count = Group.Count()})
    .ToDictionary(pair => pair.Key, pair => pair.Count);

Upvotes: 5

RASKOLNIKOV
RASKOLNIKOV

Reputation: 748

Only advice about question can be not use DISTINCT for better performance.Use grouping.

Please look to this link

Upvotes: -2

Related Questions