user3601310
user3601310

Reputation: 883

Most frequently used word(s) in the text in C#, WinForm

I need to display the most frequently used word(s) in my text using C#. I'm using WinForm, VS2012.

The following code works but it displays "I like apples".

I could break word by word in order to make it displays "apples" but that's not efficient...

I'm new to programming so simpler coding (must be in C#) would be great :)

Thank you all in advance~

string[] source = { "I like apples.", "I like red apples.", 
                             "I like red apples than green apples." };

            var frequencies = new Dictionary<string, int>();
            string highestWord = null;
            int highestFreq = 0;

            foreach (string word in source)
            {
                int freq;
                frequencies.TryGetValue(word, out freq);
                freq += 1;

                if (freq > highestFreq)
                {
                    highestFreq = freq;
                    highestWord = word;
                }
                frequencies[word] = freq;
            }

            this.lblFreqWords.Text = highestWord; 

Upvotes: 0

Views: 3866

Answers (4)

Scott Chamberlain
Scott Chamberlain

Reputation: 127603

Grant Winney's answer goes in to the why your program does not work, however there is a even better way to split out the words then just splitting on spaces and periods. Regex has the symbol \b which represents a "word boundary", it also has \w which can mean any letter a-z, 0-9, and underscore. So if you use the pattern \b\w+\b that would mean "A word boundary followed by 1 or more alpha numeric characters followed by a word boundary".

    string[] source = { "I like apples.", "I like red apples.", 
                             "I like red apples than green apples.", 
                             "red red red apples, Yum!" };

    var frequencies = new Dictionary<string, int>();
    int highestFreq = 0;

    var combinedString = string.Join(" ", source);
    var matches = Regex.Matches(combinedString, @"\b\w+\b");
    foreach (Match match in matches)
    {
        var word = match.Value;

        int freq;
        frequencies.TryGetValue(word, out freq);
        freq += 1;

        if (freq > highestFreq)
        {
            highestFreq = freq;
        }
        frequencies[word] = freq;
    }
    //This will hold a list of all the words that match 
    var highestWords = frequencies.Where(x=>x.Value == highestFreq).Select(x=>x.Key).ToList();

    Console.WriteLine("Highest freq: {0}", highestFreq);
    foreach(var word in highestWords)
    {
        Console.WriteLine(word);
    }

Run Code

This will strip out that . in your sentence. If you want hypenated words to show up as one word instead of two you need to change the pattern to \b[\w-]+\b

Upvotes: 1

Grant Winney
Grant Winney

Reputation: 66501

It's because this line is actually iterating over each sentence, not over each individual word:

foreach (string word in source)  // source is a collection of sentences

Without rewriting your entire program, the quickest way to get individual words out of your current collection would probably be to:

  • Flatten all the sentences into one long sentence (using string.join), then
  • Split that by "space" to get individual words: (and by the "." to get that out of the way)

Try this:

var words = string.Join(" ", source).Split(new[] {' ', '.'});

foreach (var word in words)
{
    ...
}

Upvotes: 2

Matthew Haugen
Matthew Haugen

Reputation: 13286

I would probably use LINQ. The following line will return an ordered IEnumerable<KeyValuePair<string, int>> that (theoretically) represents each word and its count of occurrences. You'll need to include some more cases for "special characters," like punctuation. But this is a good start.

char[] wordBreaks = new[] { ' ', '.', ',', '\'' };

return source.SelectMany(c => c.Split(wordBreaks))
             .GroupBy(c => c)
             .Select(c => new KeyValuePair<string, int>(c.Key, c.Count()))
             .OrderByDescending(c => c.Value);

Of course, once you've got that, you can grab thatValue.First().Key to find the most common word.

Upvotes: 1

Anik Islam Abhi
Anik Islam Abhi

Reputation: 25352

Try this

 string[] source = { "I like apples.", "I like red apples.", 
                             "I like red apples than green apples." };

            var frequencies = new Dictionary<string, int>();
            string highestWord = null;
            int highestFreq = 0;

            var message = string.Join(" ", source);
            var splichar = new char[] { ' ', '.' };
            var single = message.Split(splichar);
            foreach (var item in single)
            {
                int freq;
                frequencies.TryGetValue(item, out freq);
                freq += 1;

                if (freq > highestFreq)
                {
                    highestFreq = freq;
                    highestWord = item.Trim();
                }
                frequencies[item] = freq;
            }




            this.lblFreqWords.Text = highestWord;

Upvotes: 1

Related Questions