Michael T
Michael T

Reputation: 719

linq groupby and max count

I have a class like

public class Test
{
    public string name;
    public int status;
}

Example data

new Test("Name1", 1);
new Test("Name2", 2);
new Test("Name3", 3);
new Test("Name4", 1);
new Test("Name5", 2);
new Test("Name6", 2);
new Test("Name7", 3);

I'm looking for some linq to return the value 2 - which is the status that occurs the most.

Currently I have the following which is not correct.

  var status = listTest.GroupBy(x => x.status).Select(x => x.OrderByDescending(t => t.status).First()).FirstOrDefault().status;

But hoping there is something cleaner?

Upvotes: 1

Views: 4296

Answers (3)

Harald Coppoolse
Harald Coppoolse

Reputation: 30454

Requirement: Given a sequence of objects of class Test, where every Test has an int property Status, give me the value of Status that occurs the most.

For this, make Groups Test objects that have the same value for property Status. Count the number of elements in each group. Order the result such that the group with the largest number comes first, and take the first element.

IEnumerable<Test> testSequence = ...
var statusThatOccursMost = testSequence

    // make Groups of Tests that have the same value for Status:
    .GroupBy(test => test.Status,

        // parameter resultSelector: for every occurring Status value and all
        // Tests with this common status value, make one new object,
        // containing the common Status value and the number of Tests that have
        // this common Status value
        (commonStatusValue, testsThatHaveThisCommonStatusValue) => new
        {
            Status = commonStatusValue,
            Count = testsThatHaveThisCommonStatusValue.Count(),
        })

Result: a sequence of [Status, Count] combinations. The Status occurs at least once in testSequence. Count is the number of times that Status occurs. So we know, that Count is >= 1.

Order this sequence of [Status, Count] combinations by descending value of Count, so the first element is the one with the largest value for Count:

    .OrderByDescenting(statusCountCombination => statusCountCombination.Count)

Result: a sequence of [Status, Count] combinations, where the combination with the largest value of Count comes first.

Extract the value of Status from the combination, and take the first one:

    .Select(statusCountCombination => statusCountCombination.Status)
    .FirstOrDefault();

Optimization

Although this LINQ is fairly simple, it is not very efficient to count all Status values and order all StatusCount combinations, if you only want the one that has the largest value for Count.

Consider to create an extension method. If you are not familiar with extension methods, read Extension Methods Demystified

Make a Dictionary: key is the Status. Value is the number of time this Status has occurred. Then take the Status with the largest Count

public static int ToMostOccuringStatusValueOrDefault(
    this IEnumerable<Test> testSequence)
{
    // return default if testSequence is empty
    if (!testSequence.Any()) return 0;

    Dictionary<int, int> statusCountCombinations = new Dictionary<int, int>();
    foreach (Test test in testSequence)
    {
        if (statusCountCombinations.TryGetValue(test.Status, out int count)
        {
            // Status value already in dictionary: increase count:
            statusCountCombinations[test.Status] = count + 1;
        }
        else
        {
            // Status value not in dictionary yet. Add with count 1
            statusCountCombinations.Add(test.Status, 1);
        }
    }

GroupBy works similar to above, except it will first make a Dictionary where every Value is a list of Tests. Then if counts the number of Tests, and throws away the list. In the extension method we don't have to make the List.

Continuing the extension method: find the KeyValuePair that has the largest Value. We can use Enumerable.Aggregate, or enumerate:

    using (var enumerator = statusCountCombinations.GetEnumerator())
    {
        // we know there is at least one element
        enumerator.MoveNext();
        // the first element is the largest until now:
        KeyValuePair<int, int> largest = enumerator.Current;

        // enumerate the rest:
        while (enumerator.MoveNext)
        {
            if (enumerator.Current.Value > largest.Value)
            {
                 // found a new largest one
                 largest = enumerator.Current;
            }
        }
        return largest.Key;
    }
}

In this method we only have to enumerate testSequence once, and your Dictionary once. If you would use Linq GroupBy / OrderByDescending, the result of GroupBy would be enumerated several times

Usage:

IEnumerable<Test> testSequence = ...
var mostCommonStatus = testSequence.ToMostOccurringStatusValueOrDefault();

Upvotes: 0

Akshay Bheda
Akshay Bheda

Reputation: 783

You can group and pick the top after sorting them descending

var value = list.GroupBy(q => q.status)
            .OrderByDescending(gp => gp.Count())
            .First().Key;

Upvotes: 1

Charlieface
Charlieface

Reputation: 71168

I think this is what you want

You need to order the groups themselves, not what is in each group.

var status = listTest
    .GroupBy(x => x.Status)
    .OrderByDescending(g => g.Count())
    .FirstOrDefault()?.Key;

Upvotes: 10

Related Questions