Michael Goldshteyn
Michael Goldshteyn

Reputation: 74390

Collection of strings to dictionary

Given an ordered collection of strings:

var strings = new string[] { "abc", "def", "def", "ghi", "ghi", "ghi", "klm" };

Use LINQ to create a dictionary of string to number of occurrences of that string in the collection:

IDictionary<string,int> stringToNumOccurrences = ...;

Preferably do this in a single pass over the strings collection...

Upvotes: 2

Views: 1248

Answers (6)

corvuscorax
corvuscorax

Reputation: 5940

This is a foreach version like the one that Jon mentions that he finds "pretty icky" in his answer. I'm putting it in here, so there's something concrete to talk about.

I must admit that I find it simpler than Jon's version and can't really see what's icky about it. Jon? Anyone?

static Dictionary<string, int> CountOrderedSequence(IEnumerable<string> source)
{
    var result = new Dictionary<string, int>();
    string prev = null;
    int count = 0;
    foreach (var s in source)
    {
        if (prev != s && count > 0)
        {
            result.Add(prev, count);
            count = 0;
        }
        prev = s;
        ++count;
    }
    if (count > 0)
    { 
        result.Add(prev, count);
    }
    return result;
}

Updated to add a necessary check for empty source - I still think it's simpler than Jon's :-)

Upvotes: 0

Jon Skeet
Jon Skeet

Reputation: 1500675

Timwi/Darin's suggestion will perform this in a single pass over the original collection, but it will create multiple buffers for the groupings. LINQ isn't really very good at doing this kind of counting, and a problem like this was my original motiviation for writing Push LINQ. You might like to read my blog post on it for more details about why LINQ isn't terribly efficient here.

Push LINQ and the rather more impressive implementation of the same idea - Reactive Extensions - can handle this more efficiently.

Of course, if you don't really care too much about the extra efficiency, go with the GroupBy answer :)

EDIT: I hadn't noticed that your strings were ordered. That means you can be much more efficient, because you know that once you've seen string x and then string y, if x and y are different, you'll never see x again. There's nothing in LINQ to make this particularly easier, but you can do it yourself quite easily:

public static IDictionary<string, int> CountEntries(IEnumerable<string> strings)
{
    var dictionary = new Dictionary<string, int>();

    using (var iterator = strings.GetEnumerator())
    {
        if (!iterator.MoveNext())
        {
            // No entries
            return dictionary;
        }
        string current = iterator.Current;
        int currentCount = 1;
        while (iterator.MoveNext())
        {
            string next = iterator.Current;
            if (next == current)
            {
                currentCount++;
            }
            else
            {
                dictionary[current] = currentCount;
                current = next;
                currentCount = 1;
            }
        }
        // Write out the trailing result
        dictionary[current] = currentCount;
    }
    return dictionary;
}

This is O(n), with no dictionary lookups involved other than when writing the values. An alternative implementation would use foreach and a current value starting off at null... but that ends up being pretty icky in a couple of other ways. (I've tried it :) When I need special-case handling for the first value, I generally go with the above pattern.

Actually you could do this with LINQ using Aggregate, but it would be pretty nasty.

Upvotes: 5

Timwi
Timwi

Reputation: 66573

The standard LINQ way is this:

stringToNumOccurrences = strings.GroupBy(s => s)
                                .ToDictionary(g => g.Key, g => g.Count());

Upvotes: 3

Timwi
Timwi

Reputation: 66573

If you are looking for a particularly efficient (fast) solution, then GroupBy is probably too slow for you. You could use a loop:

var strings = new string[] { "abc", "def", "def", "ghi", "ghi", "ghi", "klm" };
var stringToNumOccurrences = new Dictionary<string, int>();
foreach (var str in strings)
{
    if (stringToNumOccurrences.ContainsKey(str))
        stringToNumOccurrences[str]++;
    else
        stringToNumOccurrences[str] = 1;
}
return stringToNumOccurrences;

Upvotes: 0

Dan Tao
Dan Tao

Reputation: 128307

If this is actual production code, I'd go with Timwi's response.

If this is indeed homework and you're expected to write your own implementation, it shouldn't be too tough. Here are just a couple of hints to point you in the right direction:

  1. Dictionary<TKey, TValue> has a ContainsKey method.
  2. The IDictionary<TKey, TValue> interface's this[TKey] property is settable; i.e., you can do dictionary[key] = 1 (which means you can also do dictionary[key] += 1).

From those clues I think you should be able to figure out how to do it "by hand."

Upvotes: 0

Darin Dimitrov
Darin Dimitrov

Reputation: 1038850

var dico = strings.GroupBy(x => x).ToDictionary(x => x.Key, x => x.Count());

Upvotes: 8

Related Questions