user1344280
user1344280

Reputation:

Dictionary<> value count c#

I have dictionary object like this:

var dictionary = new Dictionary<string, List<int>()>;

The number of keys is not very large but the list of integers in the value can be quite large (in the order of 1000's)

Given a list of keys (keylist), I need to count the number of times each integer appears for each key and return them ordered by frequency.

Output:

{int1, count1}
{int2, count2}
...

This is the solution I have come up with:

var query = _keylist.SelectMany(
             n=>_dictionary[n]).Group(g=>g).Select(
                 g=> new[] {g.key, g.count}).OrderByDescending(g=>g[1]);

Even when this query produces the desired result, it's not very efficient. Is there a clever way to produce the same result with less processing?

Upvotes: 0

Views: 1678

Answers (2)

Timothy Shields
Timothy Shields

Reputation: 79441

From an algorithmic space- and time-usage point of view, the only thing I see that is suboptimal is the use of GroupBy when you don't actually need the groups (only the group counts). You can use the following extension method instead.

public static Dictionary<K, int> CountBy<T, K>(
    this IEnumerable<T> source,
    Func<T, K> keySelector)
{
    return source.SumBy(keySelector, item => 1);
}

public static Dictionary<K, int> SumBy<T, K>(
    this IEnumerable<T> source,
    Func<T, K> keySelector,
    Func<T, int> valueSelector)
{
    if (source == null)
    {
        throw new ArgumentNullException("source");
    }
    if (keySelector == null)
    {
        throw new ArgumentNullException("keySelector");
    }
    var dictionary = new Dictionary<K, int>();
    foreach (var item in source)
    {
        var key = keySelector(item);
        int count;
        if (!dictionary.TryGetValue(key, out count))
        {
            count = 0;
        }
        dictionary[key] = count + valueSelector(item);
    }
    return dictionary;
}

Note the advantage is that the lists of numbers are enumerated but not stored. Only the counts are stored. Note also that the keySelector parameter is not even necessary in your case and I only included it to make the extension method slightly more general.

The usage is then as follows.

var query = _keylist
    .Select(k => _dictionary[k])
    .CountBy(n => n)
    .OrderByDescending(p => p.Value);

This will you get you a sequence of KeyValuePair<int, int> where the Key is the number from your original lists and the Value is the count.


To more efficiently handle a sequence of queries, you can preprocess your data.

Dictionary<string, Dictionary<int, int>> preprocessedDictionary
    = _dictionary.ToDictionary(p => p.Key, p => p.Value.CountBy(n => n));

Now you can perform a query more efficiently.

var query = _keylist
    .SelectMany(k => preprocessedDictionary[k])
    .SumBy(p => p.Key, p => p.Value)
    .OrderByDescending(p => p.Value);

Upvotes: 2

Enigmativity
Enigmativity

Reputation: 117027

I would do it this way:

var query =
    from k in _keylist
    from v in dictionary[k]
    group v by v into gvs
    let result = new
    {
        key = gvs.Key,
        count = gvs.Count(),
    }
    orderby result.count descending
    select result;

To me this is quite straight forward and simple and well worth accepting any (minor) performance hit by using LINQ.


And alternative approach that doesn't create the large list of groups would be to do this:

var query =
    _keylist
        .SelectMany(k => dictionary[k])
        .Aggregate(
            new Dictionary<int, int>(),
            (d, v) =>
            {
                if (d.ContainsKey(v))
                {
                    d[v] += 1;
                }
                else
                {
                    d[v] = 1;
                }
                return d;
            })
    .OrderByDescending(kvp => kvp.Value)
    .Select(kvp => new
    {
        key = kvp.Key,
        count = kvp.Value,
    });

Upvotes: 2

Related Questions