user266003
user266003

Reputation:

To count the frequency of each word

There's a directory with a few text files. How do I count the frequency of each word in each file? A word means a set of characters that can contain the letters, the digits and the underlining characters.

Upvotes: 3

Views: 18710

Answers (5)

Mayank Singh
Mayank Singh

Reputation: 65

string input= File.ReadAllText(filename);
var arr = input.Split(' ');
// finding frequencies of words in a string
IDictionary<string, int> dict = new Dictionary<string, int>();
foreach (var item in arr)
{
    var count = 0;
    if (dict.TryGetValue(item, out count))
        dict[item] = ++a;
    else
        dict.Add(item, 1);
}

Upvotes: 1

nawfal
nawfal

Reputation: 73253

There is a Linq-ish alternative which imo is simpler. The key here is to use the framework built in File.ReadLines (which is lazily read which is cool) and string.Split.

private Dictionary<string, int> GetWordFrequency(string file)
{
    return File.ReadLines(file)
               .SelectMany(x => x.Split())
               .Where(x => x != string.Empty)
               .GroupBy(x => x)
               .ToDictionary(x => x.Key, x => x.Count());
}

To get frequencies from many files, you can have an overload based on params.

private Dictionary<string, int> GetWordFrequency(params string[] files)
{
    return files.SelectMany(x => File.ReadLines(x))
                .SelectMany(x => x.Split())
                .Where(x => x != string.Empty)
                .GroupBy(x => x)
                .ToDictionary(x => x.Key, x => x.Count());
}

Upvotes: 3

Majid
Majid

Reputation: 3481

@aKzenT answer is good, but has a problem! his code never checks if the word is already exists in the dictionary or not! so I modified the code as following:

private void countWordsInFile(string file, Dictionary<string, int> words)
{
    var content = File.ReadAllText(file);

    var wordPattern = new Regex(@"\w+");

    foreach (Match match in wordPattern.Matches(content))
    {
        if (!words.ContainsKey(match.Value))
            words.Add(match.Value, 1);
        else
            words[match.Value]++;
    }
}

Upvotes: 0

aKzenT
aKzenT

Reputation: 7915

Here is a solution that should count all the word frequencies in a file:

    private void countWordsInFile(string file, Dictionary<string, int> words)
    {
        var content = File.ReadAllText(file);

        var wordPattern = new Regex(@"\w+");

        foreach (Match match in wordPattern.Matches(content))
        {
            int currentCount=0;
            words.TryGetValue(match.Value, out currentCount);

            currentCount++;
            words[match.Value] = currentCount;
        }
    }

You can call this code like this:

        var words = new Dictionary<string, int>(StringComparer.CurrentCultureIgnoreCase);

        countWordsInFile("file1.txt", words);

After this words will contain all words in the file with their frequency (e.g. words["test"] returns the number of times that "test" is in the file content. If you need to accumulate the results from more than one file, simply call the method for all files with the same dictionary. If you need separate results for each file then create a new dictionary each time and use a structure like @DarkGray suggested.

Upvotes: 10

Serj-Tm
Serj-Tm

Reputation: 16981

Word counting:

int WordCount(string text)
{
  var regex = new System.Text.RegularExpressions.Regex(@"\w+");

  var matches = regex.Matches(text);
  return matches.Count;     
}

Read text from file:

string text = File.ReadAllText(filename);

Word counting structure:

class FileWordInfo
{
  public Dictionary<string, int> WordCounts = new Dictionary<string, int>();
}

List<FileWordInfo> fileInfos = new List<FileWordInfo>();

Upvotes: 0

Related Questions