Varun Rathore
Varun Rathore

Reputation: 7898

Linq: GroupBy vs Distinct

I've been trying to get a Linq query to return distinct values from a collection. I've found two ways to go about it; either use GroupBy or Distinct. I know that Distinct was made for the job but I have to implement IEquatable on the object.

I tried GroupBy and that worked just fine. I want to know if using Distinct vs GroupBy has a distinct performance advantage.

Upvotes: 20

Views: 15086

Answers (2)

Cosmin Leferman
Cosmin Leferman

Reputation: 158

Old question but I stumbled on this just now and decided to run a benchmark with BenchmarkDotNet on .NET 8. Note: it's not specifically stated in the question and I assumed it's asking for the in-memory performance.

[Benchmark]
public bool GroupBy_Count_IEnumerable()
{
    return list.GroupBy(x => x.Property).Count() > 1;
}

[Benchmark]
public bool SelectDistinct_Count_IEnumerable()
{
    return list.Select(x => x.Property).Distinct().Count() > 1;
}

Result is: Select(x => x.Property).Distinct() is faster than GroupBy(x => x.Property).

200 items:

Method Mean Error StdDev
GroupBy_Count_IEnumerable 11.537 us 0.4056 us 1.1703 us
SelectDistinct_Count_IEnumerable 5.660 us 0.0894 us 0.0792 us

50_000 items:

Method Mean Error StdDev Median
GroupBy_Count_IEnumerable 7.750 ms 0.1527 ms 0.2978 ms 7.631 ms
SelectDistinct_Count_IEnumerable 5.352 ms 0.1064 ms 0.2747 ms 5.280 ms

Upvotes: 2

Sergey Berezovskiy
Sergey Berezovskiy

Reputation: 236278

Distinct() will compare entire objects in collection (for reference types you need GetHashCode and Equals to be overridden). It will enumerate items and just add them to set. Simple and fast. Something like:

Set<TSource> set = new Set<TSource>(comparer);

foreach (TSource tSource in source)
{
     if (!set.Add(tSource))
          continue;

     yield return tSource;
}

GroupBy() allows you to group object by some key. In this case keys will be compared. It will need to execute key selector lambda for each item in collection. Also it will need to create grouping for each distinct key and add each item in collection to its group:

Func<TSource, TElement> elementSelector = x => x;

<TKey, TElement> lookup = new Lookup<TKey, TElement>(comparer);
foreach (TSource tSource in source)
{
     TKey key = keySelector(tSource);

     // simplified pseudo-code
     if (!lookup.Contains(key))
          lookup.Add(new Grouping(key)); 

     lookup[key].Add(elementSelector(tSource));
}

foreach(IGrouping<TKey, TElement> grouping in lookup)
    yield return grouping;

So, I think GroupBy() is not that fast as simple Distict().

Upvotes: 23

Related Questions