J. Lastin
J. Lastin

Reputation: 111

Textfile word frequency

I am supposed to read a text file and print out the frequencies of words in descending order, however "A word is considered to be any sequence of consecutive letters that is not preceded or followed by a letter".

Is there a way for me to define the wordBreak chars as anything not in the english alphabet or maybe utilize Regex somehow?

For example make it so the program recognizes the string "a198$a1a1a'ač a" as the word "a" with the frequency of 6.

     {
        char[] wordBreak = new char[] { ' ', ',', ';', '.', '/', '\"', '[', ']', '!'};
        var wordFreq = new Dictionary<string, int>();
        using (var fileStream = File.Open("text.in", FileMode.Open, FileAccess.Read))
        using (var streamReader = new StreamReader(fileStream))
        {
            string line;
            while ((line = streamReader.ReadLine()) != null)
            {
                var words = line.Split(wordBreak, StringSplitOptions.RemoveEmptyEntries);

                foreach (var word in words)
                {
                    if (wordFreq.ContainsKey(word))
                    {
                        wordFreq[word]++;
                    }
                    else
                    {
                        wordFreq.Add(word, 1);
                    }

                }
            }
        }
     }

Upvotes: 0

Views: 170

Answers (3)

J. Lastin
J. Lastin

Reputation: 111

Okay I did this and it works, but there's probably a better way to do this.

static void Main(string[] args)
        {
            var wordFreq = new Dictionary<string, int>();
            using (var fileStream = File.Open("text.in", FileMode.Open, FileAccess.Read))
            using (var streamReader = new StreamReader(fileStream))
            {
                string line;
                while ((line = streamReader.ReadLine()) != null)
                {
                    var words = Regex.Split(line, @"[^A-Za-z]+");
                    foreach (var word in words)
                    {
                     if (word.Equals("")) { continue; }   
                            if (wordFreq.ContainsKey(word))
                            {
                                wordFreq[word]++;
                            }
                            else
                            {
                                wordFreq.Add(word, 1);
                            }               
                    }
                }
            }    

Upvotes: 1

steve16351
steve16351

Reputation: 5812

Yes, you can use Regex. For example:

MatchCollection matches = Regex.Matches("a198$a1a1a'ač a", "[a-zA-Z]+");

var wordFreqs = matches
    .Cast<Match>()
    .GroupBy(a => a.Value)
    .OrderByDescending(a => a.Count())
    .Select(a => new { Word = a.Key, Freq = a.Count() });

foreach (var wordFreq in wordFreqs)
    Console.WriteLine($"\"{wordFreq.Word}\" occurs {wordFreq.Freq} times");

Upvotes: 2

Rufus L
Rufus L

Reputation: 37020

One way to do this would be to simply walk the string character by character. If the character is a letter, append it to a currentWord string. If it's not a letter and currentWord has some characters, then either add that word to a dictionary (with the value 1) or increment the count for that word if it already exists:

private static Dictionary<string, int> GetWords(string input)
{
    var result = new Dictionary<string, int>();
    if (string.IsNullOrWhiteSpace(input)) return result;

    var currentWord = "";

    foreach (var chr in input)
    {
        if (char.IsLetter(chr))
        {
            currentWord += chr;
        }
        else if (currentWord.Length > 0)
        {
            if (result.ContainsKey(currentWord)) result[currentWord]++;
            else result.Add(currentWord, 1);
            currentWord = "";
        }
    }

    if (currentWord.Length > 0)
    {
        if (result.ContainsKey(currentWord)) result[currentWord]++;
        else result.Add(currentWord, 1);
    }

    return result;
}

In use, you would simply do something like:

private static void Main(string[] args)
{
    var words = GetWords("a198$a1a1a'ač a");

    foreach (var word in words)
    {
        Console.WriteLine($"The word '{word.Key}' occurrs {word.Value} times.");
    }

    GetKeyFromUser("\nDone! Press any key to exit...");
}

Output

![enter image description here

Upvotes: 1

Related Questions