Reputation: 97
I am working on a problem, in which I have to be able to read a text file, and count the frequency and line number of a specific word.
So for example, a txt file that reads
"Hi my name is
Bob. This is
Cool"
Should return:
1 Hi 1
1 my 1
1 name 1
2 is 1 2
1 bob 2
1 this 2
1 cool 3
I am having trouble deciding how to store the line number, as well as the word frequency. I have tried a few different things, and so far this is where I am at.
Any help?
Dictionary<string, int> countDictionary = new Dictionary<string,int>();
Dictionary<string, List<int>> lineDictionary = new Dictionary<string, List<int>>();
List<string> lines = new List<string>();
System.IO.StreamReader file =
new System.IO.StreamReader("Sample.txt");
//Creates a List of lines
string x;
while ((x = file.ReadLine()) != null)
{
lines.Add(x);
}
foreach(var y in Enumerable.Range(0,lines.Count()))
{
foreach(var word in lines[y].Split())
{
if(!countDictionary.Keys.Contains(word.ToLower()) && !lineDictionary.Keys.Contains(word.ToLower()))
{
countDictionary.Add(word.ToLower(), 1);
//lineDictionary.Add(word.ToLower(), /*what to put here*/);
}
else
{
countDictionary[word] += 1;
//ADD line to dictionary???
}
}
}
foreach (var pair in countDictionary)//WHAT TO PUT HERE to print both
{
Console.WriteLine("{0} {1}", pair.Value, pair.Key);
}
file.Close();
System.Console.ReadLine();
Upvotes: 1
Views: 81
Reputation: 120450
You can pretty much do this with one line of linq
var processed =
//get the lines of text as IEnumerable<string>
File.ReadLines(@"myFilePath.txt")
//get a word and a line number for every word
//so you'll have a sequence of objects with 2 properties
//word and lineNumber
.SelectMany((line, lineNumber) => line.Split().Select(word => new{word, lineNumber}))
//group these objects by their "word" property
.GroupBy(x => x.word)
//select what you need
.Select(g => new{
//number of objects in the group
//i.e. the frequency of the word
Count = g.Count(),
//the actual word
Word = g.Key,
//a sequence of line numbers of each instance of the
//word in the group
Positions = g.Select(x => x.lineNumber)});
foreach(var entry in processed)
{
Console.WriteLine("{0} {1} {2}",
entry.Count,
entry.Word,
string.Join(" ",entry.Positions));
}
I like 0 based counting, so you may want to add 1 in the appropriate place.
Upvotes: 3
Reputation: 150108
You are tracking two different properties of the entity "word" in two separate data structures. I would suggest creating a class to represent that entity, something like
public class WordStats
{
public string Word { get; set; }
public int Count { get; set; }
public List<int> AppearsInLines { get; set; }
public Word()
{
AppearsInLines = new List<int>();
}
}
Then track things in a
Dictionary<string, WordStats> wordStats = new Dictionary<string, WordStats>();
Use the word itself as the key. When you encounter a new word, check whether there is already an instance of Word with that specific key. If so, get it and update the Count and AppearsInLines property; if not create a new instance and add it to the dictionary.
foreach(var y in Enumerable.Range(0,lines.Count()))
{
foreach(var word in lines[y].Split())
{
WordStats wordStat;
bool alreadyHave = words.TryGetValue(word, out wordStat);
if (alreadyHave)
{
wordStat.Count++;
wordStat.AppearsInLines.Add(y);
}
else
{
wordStat = new WordStats();
wordStat.Count = 1;
wordStat.AppearsInLines.Add(y);
wordStats.Add(word, wordStat);
}
Upvotes: 1