Reputation: 839
I noticed that LINQ to Objects has a GroupBy method. In light of this, I was wondering if anyone can build a case for grouping a list of objects in memory using LINQ vs having SQL Server perform the grouping?
Upvotes: 2
Views: 2261
Reputation: 23108
As already stated there may be several reasons for choosing grouping client-side (C#) vs. server-side (SQL Server). I decided to make a small test - the following query should decide what pairs of articles and users are missing from a specific computation.
The query returns about 1.5M records grouped in more than 15000 groups.
Grouping in SQL
allArticleConcepts = DataAccess.ArticleConceptRepository.AllNoTracking
.Join(DataAccess.ArticleAnalysisDataRepository.AllNoTracking.Where(aa => aa.CommentCount >= minCommentCount),
outer => outer.ArticleId,
inner => inner.ArticleId,
(outer, inner) => outer)
.Where(ac => missingXData.Any(x => x.ArticleId == ac.ArticleId))
.GroupBy(ac => ac.ArticleId)
.ToDictionary(grp => grp.Key, grp => grp
.Select(ac => new Concept { ContextSynLexemId = ac.LexemId, LexemId = ac.LexemId, Frequency = ac.Freq })
.ToList());
Generated query is very complex and thus taking a long time.
Grouping in .NET
allArticleConcepts = DataAccess.ArticleConceptRepository.AllNoTracking
.Join(DataAccess.ArticleAnalysisDataRepository.AllNoTracking.Where(aa => aa.CommentCount >= minCommentCount),
outer => outer.ArticleId,
inner => inner.ArticleId,
(outer, inner) => outer)
.Where(ac => missingXData.Any(x => x.ArticleId == ac.ArticleId))
.ToList()
.GroupBy(ac => ac.ArticleId)
.ToDictionary(grp => grp.Key, grp => grp
.Select(ac => new Concept { ContextSynLexemId = ac.LexemId, LexemId = ac.LexemId, Frequency = ac.Freq })
.ToList());
Generated SQL is much simpler and faster. However, C# computation is slightly slower.
Grouping in .NET with some parallel flavor
allArticleConcepts = DataAccess.ArticleConceptRepository.AllNoTracking
.Join(DataAccess.ArticleAnalysisDataRepository.AllNoTracking.Where(aa => aa.CommentCount >= minCommentCount),
outer => outer.ArticleId,
inner => inner.ArticleId,
(outer, inner) => outer)
.Where(ac => missingXData.Any(x => x.ArticleId == ac.ArticleId))
.ToList()
.AsParallel()
.GroupBy(ac => ac.ArticleId)
.ToDictionary(grp => grp.Key, grp => grp
.Select(ac => new Concept { ContextSynLexemId = ac.LexemId, LexemId = ac.LexemId, Frequency = ac.Freq })
.ToList());
This gives a minor improvement on the client side.
As a conclusion on how to perform grouping:
Note: if query deals with large number of entities, but the result is relatively small, one can consider direct query execution (or stored procedure) and map the result to entities. This ensures maximum flexibility for writing the query and minimizing C# - SQL Server round-trip time.
Upvotes: 1
Reputation: 25426
Some reasons you might want to group using LINQ on the client are:
data.GroupBy(d => DoSomethingComplicatedWith(d))
.Upvotes: 6
Reputation: 2648
There are (not too few) situations where your objects are not contained in a database.
Upvotes: 1
Reputation: 28046
If you are dealing with a small set of data, performance on the client is not a concern, and going back to the database again to get the data into the format you want is not an option (or an undesirable option), then doing it in memory is fine.
In all other cases, your best option would be to let the database do this work, as it is optimized for exactly this kind of operation.
Upvotes: 1
Reputation: 6524
On a large data set with properly indexed tables, SQL Server will be faster. Every time. Hands down. For smaller datasets, you might not even notice a difference.
Upvotes: 4