Scott
Scott

Reputation: 21

How so I take the Top N (percentage) values in a dictionary?

I have a dictionary with a string key and integer value. The value represents the number of occurrences of the key.

How do I create a new dictionary with the keys and values representing the top 25% of values? The sum of the values should be equal to or greater than the sum of all values. For example, if my dictionary contains 5 items with values (5, 3, 2, 1, 1) and I want the top 50%, the new dictionary would contain values (5, 3) because their sum is 8 and that is >= 50% of 12. This dictionary needs to be sorted descending by value and then the top N taken such that their sum meets the specified percentage.

This code gives me the top N but is based on a known count. How do I take into account the desired percentage?

var topItemsCount = dictionary.OrderByDescending(entry => entry.Value)
                   .Take(topN)
                   .ToDictionary(pair => pair.Key, pair => pair.Value);

Upvotes: 2

Views: 1353

Answers (3)

maxwellb
maxwellb

Reputation: 13954

Rephrasing the question, into two parts:

  1. Given a list of strings and values, find a value representing the Nth percentage
  2. Given a list of string and values, and a value representing the Nth percentage, return a new list of string and values having values greater than or equal to the given number.

Question 1 would look like

double percent = inputValue;
double n = dictionary.Values.Sum() * percent;

Question 2 would look like:

Dictionary<string, int> newValues = dictionary.OrderByDescending(_ => _.Value)
    .Aggregate(
        new {sum = 0.0, values = new Dictionary<string, int>()},
        (sumValues, kv) =>
        {
            if (sumValues.sum <= n)
                sumValues.values.Add(kv.Key, kv.Value);
            return new {sum = sumValues.sum + kv.Value, values = sumValues.values};
        },
        sumValues => sumValues.values);

You could also use a for loop and a running sum, but for running totals with limited scope, I like the compactness of the Aggregate function. The downside to this is that the entire source Dictionary is still iterated. A custom iterator method would get around this. For example:

public static class Extensions
{
    public static IEnumerable<TThis> TakeGreaterThan<TThis>(this IEnumerable<TThis> source, Func<TThis, double> valueFunc, double compareTo)
    {
        double sum = 0.0;
        IEnumerable<TThis> orderedSource = source.OrderByDescending(valueFunc);
        var enumerator = orderedSource.GetEnumerator();
        while (sum <= compareTo && enumerator.MoveNext())
        {
            yield return enumerator.Current;
            sum += valueFunc(enumerator.Current);
        }
    }
}

Used as

Dictionary<string, int> newValues = dictionary.TakeGreaterThan(_ => _.Value, n).ToDictionary(_ => _.Key, _ => _.Value);

Upvotes: 1

mmmdreg
mmmdreg

Reputation: 6608

Something like:

var topItemsCount = dictionary.OrderByDescending(entry => entry.Value)
               .Take(Math.Floor(dictionary.Count * 0.25))
               .ToDictionary(pair => pair.Key, pair => pair.Value);

Running .Count on a dictionary returns the number of key-value pairs in the collection. Taking Math.Floor rounds it down to the nearest int.

Edited to reflect comments

I would probably just use a simple non-linq solution to achieve what you want. Maybe more verbose, but it's pretty clear to anyone what it does:

var total = dictionary.Sum(e => e.Value);
var cutoff = total * 0.5;
var sum = 0;

var pairs = new List<KeyValuePair<string, int>>();
foreach (var pair in dictionary.OrderByDescending(e => e.Value))
{
     sum += pair.Value;
     pairs.Add(pair);

     if (sum > cutoff)
         break;
}

dictionary = pairs.ToDictionary(pair => pair.Key, pair => pair.Value);

One more edit

If you really want more linq, you could try holding an accumulated class level variable.

private static int sum = 0;

static void Main(string[] args)
{
    var dictionary = new Dictionary<string, int>()
    {
        {"1",5},         
        {"2",3},
        {"3",2},
        {"4",1},
        {"5",1},
    };

    var total = dictionary.Sum(e => e.Value);
    var cutoff = total * 0.5;

    var filtered = dictionary.OrderByDescending(e => e.Value)
        .TakeWhile(e => Add(e.Value).Item1 < cutoff)
        .ToDictionary(pair => pair.Key, pair => pair.Value);
}

private static Tuple<int, int> Add(int x)
{
    return Tuple.Create(sum, sum += x);
}

It's a bit convoluted with the add function returning a tuple because you are including the first value that breaches the cut off in the result (i.e. even if 5 + 3 = 8 is greater than the cut off 6, you still include 3).

Upvotes: 1

Sriram Sakthivel
Sriram Sakthivel

Reputation: 73482

May be this?

var dictionary = new Dictionary<string, int>()
{
    {"1",5},         
    {"2",3},
    {"3",2},
    {"4",1},
    {"5",1},
};

var max = dictionary.Values.Max();
int percent = 50;
int percentageValue = max*percent /100;

var topItems = dictionary.OrderByDescending(entry => entry.Value)
       .TakeWhile(x => x.Value > percentageValue)
       .ToDictionary(pair => pair.Key, pair => pair.Value);

foreach (var item in topItems)
{
    Console.WriteLine(item.Value);
}

Outputs:

 5
 3

Upvotes: 0

Related Questions