Reputation: 111
I am supposed to read a text file and print out the frequencies of words in descending order, however "A word is considered to be any sequence of consecutive letters that is not preceded or followed by a letter".
Is there a way for me to define the wordBreak chars as anything not in the english alphabet or maybe utilize Regex somehow?
For example make it so the program recognizes the string "a198$a1a1a'ač a" as the word "a" with the frequency of 6.
{
char[] wordBreak = new char[] { ' ', ',', ';', '.', '/', '\"', '[', ']', '!'};
var wordFreq = new Dictionary<string, int>();
using (var fileStream = File.Open("text.in", FileMode.Open, FileAccess.Read))
using (var streamReader = new StreamReader(fileStream))
{
string line;
while ((line = streamReader.ReadLine()) != null)
{
var words = line.Split(wordBreak, StringSplitOptions.RemoveEmptyEntries);
foreach (var word in words)
{
if (wordFreq.ContainsKey(word))
{
wordFreq[word]++;
}
else
{
wordFreq.Add(word, 1);
}
}
}
}
}
Upvotes: 0
Views: 170
Reputation: 111
Okay I did this and it works, but there's probably a better way to do this.
static void Main(string[] args)
{
var wordFreq = new Dictionary<string, int>();
using (var fileStream = File.Open("text.in", FileMode.Open, FileAccess.Read))
using (var streamReader = new StreamReader(fileStream))
{
string line;
while ((line = streamReader.ReadLine()) != null)
{
var words = Regex.Split(line, @"[^A-Za-z]+");
foreach (var word in words)
{
if (word.Equals("")) { continue; }
if (wordFreq.ContainsKey(word))
{
wordFreq[word]++;
}
else
{
wordFreq.Add(word, 1);
}
}
}
}
Upvotes: 1
Reputation: 5812
Yes, you can use Regex
. For example:
MatchCollection matches = Regex.Matches("a198$a1a1a'ač a", "[a-zA-Z]+");
var wordFreqs = matches
.Cast<Match>()
.GroupBy(a => a.Value)
.OrderByDescending(a => a.Count())
.Select(a => new { Word = a.Key, Freq = a.Count() });
foreach (var wordFreq in wordFreqs)
Console.WriteLine($"\"{wordFreq.Word}\" occurs {wordFreq.Freq} times");
Upvotes: 2
Reputation: 37020
One way to do this would be to simply walk the string character by character. If the character is a letter, append it to a currentWord
string. If it's not a letter and currentWord
has some characters, then either add that word to a dictionary (with the value 1
) or increment the count for that word if it already exists:
private static Dictionary<string, int> GetWords(string input)
{
var result = new Dictionary<string, int>();
if (string.IsNullOrWhiteSpace(input)) return result;
var currentWord = "";
foreach (var chr in input)
{
if (char.IsLetter(chr))
{
currentWord += chr;
}
else if (currentWord.Length > 0)
{
if (result.ContainsKey(currentWord)) result[currentWord]++;
else result.Add(currentWord, 1);
currentWord = "";
}
}
if (currentWord.Length > 0)
{
if (result.ContainsKey(currentWord)) result[currentWord]++;
else result.Add(currentWord, 1);
}
return result;
}
In use, you would simply do something like:
private static void Main(string[] args)
{
var words = GetWords("a198$a1a1a'ač a");
foreach (var word in words)
{
Console.WriteLine($"The word '{word.Key}' occurrs {word.Value} times.");
}
GetKeyFromUser("\nDone! Press any key to exit...");
}
Output
Upvotes: 1