FSm
FSm

Reputation: 2057

Counting the word frequency after stemming the words

Assume that I have the following string:

"present present present presenting presentation do  do doing " 

And I'm counting the words inside the string according to their frequency in descending order:

I'm using GroupBy count 
present    3
do         2
doing      1
presenting 1
presentation 1

Then, I'm stemming the words:

using array [ , ] or any other structure

present  3
do       2
do       1
present  1
present  1

Finally, I want to recount the words into dictionary. So that the output should be:

present 5
do      3

Can anyone help please??. thanks in advance.

Upvotes: 0

Views: 1435

Answers (2)

Matan Shahar
Matan Shahar

Reputation: 3240

//Using List instead of Dictionary to allow keys multiplicity: List> words = new List< KeyValuePair>();

        string text = "present present present presenting presentation do  do doing";
        var ws = text.Split(' ');

        //Passing the words into the list:
        words = (from w in ws
                 group w by w into wsGroups
                 select new KeyValuePair<string, int>(
                     wsGroups.Key, ws.Count()
                     )
                 ).ToList<KeyValuePair<string, int>>();

        //Ordering:
        words.OrderBy(w => w.Value);

        //Stemming the words:
        words = (from w in words
                 select new KeyValuePair<string, int>
                     (
                         stemword(w.Key),
                         w.Value
                     )).ToList<KeyValuePair<string, int>>();

        //Sorting and put into Dictionary:
        var wordsRef = (from w in words
                        group w by w.Key into groups
                        select new
                        {
                            count = groups.Count(),
                            word = groups.Key
                        }).ToDictionary(w => w.word, w => w.count);

Upvotes: 1

Alexei Levenkov
Alexei Levenkov

Reputation: 100545

LINQ GroupBy or Aggregate are good methods to compute such counts.

If you want to do it by hand... It looks like you want to have 2 sets of results: one of non-stemmed words, another stemmed:

void incrementCount(Dictionary<string, int> counts, string word)
{
  if (counts.Contains(word))
  {
    counts[word]++;
  }
  else
  {
    counts.Add(word, 0);
  }
}

var stemmedCount = new Dictionary<string, int>();
var nonStemmedCount = new Dictionary<string, int>();

foreach(word in words)
{
  incrementCount(stemmedCount, Stem(word));
  incrementCount(nonStemmedCount, word);
}

Upvotes: 0

Related Questions