Reputation: 27
i would like to search for a specific set of words (or for now one word) which is "Jude" this is my current code, i can read the file, it separates the words but its just comparing them to a word is the problem. (at the moment it is rigged up to just count words and the output is correct).
Many Thanks -Fred
String theLine;
string theFile;
int counter = 0;
string[] fields = null;
string delim = " ,.";
Console.WriteLine("Please enter a filename:");
theFile = Console.ReadLine();
System.IO.StreamReader sr =
new System.IO.StreamReader(theFile);
while (!sr.EndOfStream)
{
theLine = sr.ReadLine();
theLine.Trim();
fields = theLine.Split(delim.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
counter += fields.Length;
}
sr.Close();
Console.WriteLine("The word count is {0}", counter);
Console.ReadLine();
}
Upvotes: 0
Views: 1514
Reputation: 26917
Using LINQ, you can enumerate the lines of the file, then count the number of occurrences of your word or words in each line and sum the counts together:
Console.WriteLine("Please enter a filename:");
var theFile = Console.ReadLine();
var delim = " ,.".ToCharArray();
var countWords = new HashSet(new[] { "Jude" }.Select(w => w.ToUpperInvariant()));
var count = File.ReadLines(theFile).Select(l => l.Split(delim, StringSplitOptions.RemoveEmptyEntries).Count(w => countWords.Contains(w.ToUpperInvariant()))).Sum();
Console.WriteLine("The word count is {0}", count);
If you prefer @Dai's regex pattern approach, you can use it to count the occurrences in each line, still using LINQ to process the lines and sum the counts:
Console.WriteLine("Please enter a filename:");
var theFile = Console.ReadLine();
var delim = " ,.".ToCharArray();
var countWords = new[] { "Jude" };
var wordPattern = new Regex(@"\b(?:"+String.Join("|", countWords)+@")\b", RegexOptions.Compiled|RegexOptions.IgnoreCase);
var count = File.ReadLines(theFile).Select(l => wordPattern.Matches(l).Count).Sum();
Console.WriteLine("The word count is {0}", count);
Upvotes: 2
Reputation: 155250
String.Split()
as it causes excess string allocationToCharArray()
too - you can just cache the results.using()
to ensure IDisposable
objects are always disposed.I recommend using a Regex instead:
Regex regex = new Regex( @"\bJude\b", RegexOptions.Compiled | RegexOptions.IgnoreCase );
Int32 count = 0;
using( StreamReader rdr = new StreamReader( theFile ) )
{
String line;
while( ( line = rdr.ReadLine() ) != null )
{
count += regex.Matches( line ).Count;
}
}
The \b
escape matches a "word-boundary", such as the start and end of strings and punctuation, so it will match "Jude" in the following examples: "Jude"
, "Jude foo"
, "Foo Jude"
, "Hello. Jude."
but not "JudeJude"
.
Upvotes: 1