Reputation: 7898
I've been trying to get a Linq query to return distinct values from a collection. I've found two ways to go about it; either use GroupBy or Distinct. I know that Distinct was made for the job but I have to implement IEquatable on the object.
I tried GroupBy and that worked just fine. I want to know if using Distinct vs GroupBy has a distinct performance advantage.
Upvotes: 20
Views: 15086
Reputation: 158
Old question but I stumbled on this just now and decided to run a benchmark with BenchmarkDotNet on .NET 8. Note: it's not specifically stated in the question and I assumed it's asking for the in-memory performance.
[Benchmark]
public bool GroupBy_Count_IEnumerable()
{
return list.GroupBy(x => x.Property).Count() > 1;
}
[Benchmark]
public bool SelectDistinct_Count_IEnumerable()
{
return list.Select(x => x.Property).Distinct().Count() > 1;
}
Result is:
Select(x => x.Property).Distinct()
is faster than GroupBy(x => x.Property)
.
200 items:
Method | Mean | Error | StdDev |
---|---|---|---|
GroupBy_Count_IEnumerable | 11.537 us | 0.4056 us | 1.1703 us |
SelectDistinct_Count_IEnumerable | 5.660 us | 0.0894 us | 0.0792 us |
50_000 items:
Method | Mean | Error | StdDev | Median |
---|---|---|---|---|
GroupBy_Count_IEnumerable | 7.750 ms | 0.1527 ms | 0.2978 ms | 7.631 ms |
SelectDistinct_Count_IEnumerable | 5.352 ms | 0.1064 ms | 0.2747 ms | 5.280 ms |
Upvotes: 2
Reputation: 236278
Distinct()
will compare entire objects in collection (for reference types you need GetHashCode and Equals to be overridden). It will enumerate items and just add them to set. Simple and fast. Something like:
Set<TSource> set = new Set<TSource>(comparer);
foreach (TSource tSource in source)
{
if (!set.Add(tSource))
continue;
yield return tSource;
}
GroupBy()
allows you to group object by some key. In this case keys will be compared. It will need to execute key selector lambda for each item in collection. Also it will need to create grouping for each distinct key and add each item in collection to its group:
Func<TSource, TElement> elementSelector = x => x;
<TKey, TElement> lookup = new Lookup<TKey, TElement>(comparer);
foreach (TSource tSource in source)
{
TKey key = keySelector(tSource);
// simplified pseudo-code
if (!lookup.Contains(key))
lookup.Add(new Grouping(key));
lookup[key].Add(elementSelector(tSource));
}
foreach(IGrouping<TKey, TElement> grouping in lookup)
yield return grouping;
So, I think GroupBy()
is not that fast as simple Distict()
.
Upvotes: 23