Levi J
Levi J

Reputation: 97

Storing Word Line and Frequency based on Word

I am working on a problem, in which I have to be able to read a text file, and count the frequency and line number of a specific word.

So for example, a txt file that reads

"Hi my name is

Bob. This is 

Cool"

Should return:

1 Hi 1

1 my 1

1 name 1

2 is 1 2

1 bob 2

1 this 2

1 cool 3

I am having trouble deciding how to store the line number, as well as the word frequency. I have tried a few different things, and so far this is where I am at.

Any help?

        Dictionary<string, int> countDictionary = new Dictionary<string,int>();
        Dictionary<string, List<int>> lineDictionary = new Dictionary<string, List<int>>();

        List<string> lines = new List<string>();


        System.IO.StreamReader file =
                new System.IO.StreamReader("Sample.txt");

        //Creates a List of lines
        string x;
        while ((x = file.ReadLine()) != null)
        {
            lines.Add(x);
        }

        foreach(var y in Enumerable.Range(0,lines.Count()))
        {
            foreach(var word in lines[y].Split())
            {
                if(!countDictionary.Keys.Contains(word.ToLower()) && !lineDictionary.Keys.Contains(word.ToLower()))
                {
                    countDictionary.Add(word.ToLower(), 1);
                    //lineDictionary.Add(word.ToLower(), /*what to put here*/);
                }
                else
                {
                    countDictionary[word] += 1;
                    //ADD line to dictionary???
                }
            }
        }



       foreach (var pair in countDictionary)//WHAT TO PUT HERE to print both 
       {
           Console.WriteLine("{0}  {1}", pair.Value, pair.Key);
       }

        file.Close();


        System.Console.ReadLine();

Upvotes: 1

Views: 81

Answers (2)

spender
spender

Reputation: 120450

You can pretty much do this with one line of linq

var processed =
  //get the lines of text as IEnumerable<string> 
  File.ReadLines(@"myFilePath.txt")
    //get a word and a line number for every word
    //so you'll have a sequence of objects with 2 properties
    //word and lineNumber
    .SelectMany((line, lineNumber) => line.Split().Select(word => new{word, lineNumber}))
    //group these objects by their "word" property
    .GroupBy(x => x.word)
    //select what you need
    .Select(g => new{
        //number of objects in the group
        //i.e. the frequency of the word
        Count = g.Count(), 
        //the actual word
        Word = g.Key, 
        //a sequence of line numbers of each instance of the 
        //word in the group
        Positions = g.Select(x => x.lineNumber)});

foreach(var entry in processed)
{
    Console.WriteLine("{0} {1} {2}",
                      entry.Count,
                      entry.Word,
                      string.Join(" ",entry.Positions));
}

I like 0 based counting, so you may want to add 1 in the appropriate place.

Upvotes: 3

Eric J.
Eric J.

Reputation: 150108

You are tracking two different properties of the entity "word" in two separate data structures. I would suggest creating a class to represent that entity, something like

public class WordStats
{
    public string Word { get; set; }
    public int Count { get; set; }
    public List<int> AppearsInLines { get; set; }
    public Word()
    {
        AppearsInLines = new List<int>();
    }
}

Then track things in a

Dictionary<string, WordStats> wordStats = new Dictionary<string, WordStats>();

Use the word itself as the key. When you encounter a new word, check whether there is already an instance of Word with that specific key. If so, get it and update the Count and AppearsInLines property; if not create a new instance and add it to the dictionary.

foreach(var y in Enumerable.Range(0,lines.Count()))
{
    foreach(var word in lines[y].Split())
    {
        WordStats wordStat;
        bool alreadyHave = words.TryGetValue(word, out wordStat);
        if (alreadyHave)
        {
            wordStat.Count++;
            wordStat.AppearsInLines.Add(y);
        }
        else
        {
            wordStat = new WordStats();
            wordStat.Count = 1;
            wordStat.AppearsInLines.Add(y);
            wordStats.Add(word, wordStat);
        }

Upvotes: 1

Related Questions