Reputation: 883
I need to display the most frequently used word(s) in my text using C#. I'm using WinForm, VS2012.
The following code works but it displays "I like apples".
I could break word by word in order to make it displays "apples" but that's not efficient...
I'm new to programming so simpler coding (must be in C#) would be great :)
Thank you all in advance~
string[] source = { "I like apples.", "I like red apples.",
"I like red apples than green apples." };
var frequencies = new Dictionary<string, int>();
string highestWord = null;
int highestFreq = 0;
foreach (string word in source)
{
int freq;
frequencies.TryGetValue(word, out freq);
freq += 1;
if (freq > highestFreq)
{
highestFreq = freq;
highestWord = word;
}
frequencies[word] = freq;
}
this.lblFreqWords.Text = highestWord;
Upvotes: 0
Views: 3866
Reputation: 127603
Grant Winney's answer goes in to the why your program does not work, however there is a even better way to split out the words then just splitting on spaces and periods. Regex has the symbol \b
which represents a "word boundary", it also has \w
which can mean any letter a-z, 0-9, and underscore. So if you use the pattern \b\w+\b
that would mean "A word boundary followed by 1 or more alpha numeric characters followed by a word boundary".
string[] source = { "I like apples.", "I like red apples.",
"I like red apples than green apples.",
"red red red apples, Yum!" };
var frequencies = new Dictionary<string, int>();
int highestFreq = 0;
var combinedString = string.Join(" ", source);
var matches = Regex.Matches(combinedString, @"\b\w+\b");
foreach (Match match in matches)
{
var word = match.Value;
int freq;
frequencies.TryGetValue(word, out freq);
freq += 1;
if (freq > highestFreq)
{
highestFreq = freq;
}
frequencies[word] = freq;
}
//This will hold a list of all the words that match
var highestWords = frequencies.Where(x=>x.Value == highestFreq).Select(x=>x.Key).ToList();
Console.WriteLine("Highest freq: {0}", highestFreq);
foreach(var word in highestWords)
{
Console.WriteLine(word);
}
This will strip out that .
in your sentence. If you want hypenated words to show up as one word instead of two you need to change the pattern to \b[\w-]+\b
Upvotes: 1
Reputation: 66501
It's because this line is actually iterating over each sentence, not over each individual word:
foreach (string word in source) // source is a collection of sentences
Without rewriting your entire program, the quickest way to get individual words out of your current collection would probably be to:
string.join
), thenTry this:
var words = string.Join(" ", source).Split(new[] {' ', '.'});
foreach (var word in words)
{
...
}
Upvotes: 2
Reputation: 13286
I would probably use LINQ. The following line will return an ordered IEnumerable<KeyValuePair<string, int>>
that (theoretically) represents each word and its count of occurrences. You'll need to include some more cases for "special characters," like punctuation. But this is a good start.
char[] wordBreaks = new[] { ' ', '.', ',', '\'' };
return source.SelectMany(c => c.Split(wordBreaks))
.GroupBy(c => c)
.Select(c => new KeyValuePair<string, int>(c.Key, c.Count()))
.OrderByDescending(c => c.Value);
Of course, once you've got that, you can grab thatValue.First().Key
to find the most common word.
Upvotes: 1
Reputation: 25352
Try this
string[] source = { "I like apples.", "I like red apples.",
"I like red apples than green apples." };
var frequencies = new Dictionary<string, int>();
string highestWord = null;
int highestFreq = 0;
var message = string.Join(" ", source);
var splichar = new char[] { ' ', '.' };
var single = message.Split(splichar);
foreach (var item in single)
{
int freq;
frequencies.TryGetValue(item, out freq);
freq += 1;
if (freq > highestFreq)
{
highestFreq = freq;
highestWord = item.Trim();
}
frequencies[item] = freq;
}
this.lblFreqWords.Text = highestWord;
Upvotes: 1