RPM1984
RPM1984

Reputation: 73112

Given collection of strings, count number of times each word appears in List<T>

Input 1: List<string>, e.g:

"hello", "world", "stack", "overflow".

Input 2: List<Foo> (two properties, string a, string b), e.g:

Foo 1: a: "Hello there!" b: string.Empty

Foo 2: a: "I love Stack Overflow" b: "It's the best site ever!"

So i want to end up with a Dictionary<string,int>. The word, and the number of times it appears in the List<Foo>, either in the a or the b field.

Current first-pass/top of my head code, which is far too slow:

var occurences = new Dictionary<string, int>();
foreach (var word in uniqueWords /* input1 */)
{
    var aOccurances = foos.Count(x => !string.IsNullOrEmpty(x.a) && x.a.Contains(word));
    var bOccurances = foos.Count(x => !string.IsNullOrEmpty(x.b) && x.b.Contains(word));
    occurences.Add(word, aOccurances + bOccurances);
}

Upvotes: 2

Views: 986

Answers (2)

Aron
Aron

Reputation: 15772

You could try concating the two strings a + b. Then doing a regex to pull out all the words into a collection. Then finally indexing that using a group by query.

For example

void Main()
{
    var a = "Hello there!";
    var b =  "It's the best site ever!";

    var ab = a + " " + b;

    var matches = Regex.Matches(ab, "[A-Za-z]+");
    var occurences = from x in matches.OfType<System.Text.RegularExpressions.Match>()
                    let word = x.Value.ToLowerInvariant()
                    group word by word into g
                    select new { Word = g.Key, Count = g.Count() };
    var result = occurences.ToDictionary(x => x.Word, x => x.Count);
    Console.WriteLine(result);
}

Example with some changes suggested... Edit. Just reread the requirement....kinda strange but hey...

void Main()
{
    var counts = GetCount(new [] {
        "Hello there!",
        "It's the best site ever!"
    });
    Console.WriteLine(counts);
}


public IDictionary<string, int> GetCount(IEnumerable<Foo> inputs)
{
    var allWords =      from input in inputs
                        let matchesA = Regex.Matches(input.A, "[A-Za-z']+").OfType<System.Text.RegularExpressions.Match>()
                        let matchesB = Regex.Matches(input.B, "[A-Za-z']+").OfType<System.Text.RegularExpressions.Match>()
                        from x in matchesA.Concat(matchesB)
                        select x.Value;
    var occurences = allWords.GroupBy(x => x, (x, y) => new{Key = x, Count = y.Count()}, StringComparer.OrdinalIgnoreCase);

    var result = occurences.ToDictionary(x => x.Key, x => x.Count, StringComparer.OrdinalIgnoreCase);
    return result;
}

Upvotes: 0

dahlbyk
dahlbyk

Reputation: 77530

Roughly:

  1. Build a dictionary (occurrences) from the first input, optionally with a case-insensitive comparer.
  2. For each Foo in the second input, use RegEx to split a and b into words.
  3. For each word, check if the key exists in occurrences. If it exists, increment and update the value in the dictionary.

Upvotes: 1

Related Questions