Fred Enator
Fred Enator

Reputation: 27

Search and count specific words from a text file

i would like to search for a specific set of words (or for now one word) which is "Jude" this is my current code, i can read the file, it separates the words but its just comparing them to a word is the problem. (at the moment it is rigged up to just count words and the output is correct).

Many Thanks -Fred

      String theLine;
        string theFile;
        int counter = 0;
        string[] fields = null;
        string delim = " ,.";

        Console.WriteLine("Please enter a filename:");
        theFile = Console.ReadLine();


        System.IO.StreamReader sr =
               new System.IO.StreamReader(theFile);

        while (!sr.EndOfStream)
        {
            theLine = sr.ReadLine();
            theLine.Trim();
            fields = theLine.Split(delim.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
            counter += fields.Length;
        }


        sr.Close();
        Console.WriteLine("The word count is {0}", counter);
        Console.ReadLine();
    }

Upvotes: 0

Views: 1514

Answers (2)

NetMage
NetMage

Reputation: 26917

Using LINQ, you can enumerate the lines of the file, then count the number of occurrences of your word or words in each line and sum the counts together:

Console.WriteLine("Please enter a filename:");
var theFile = Console.ReadLine();

var delim = " ,.".ToCharArray();
var countWords = new HashSet(new[] { "Jude" }.Select(w => w.ToUpperInvariant()));
var count = File.ReadLines(theFile).Select(l => l.Split(delim, StringSplitOptions.RemoveEmptyEntries).Count(w => countWords.Contains(w.ToUpperInvariant()))).Sum();
Console.WriteLine("The word count is {0}", count);

If you prefer @Dai's regex pattern approach, you can use it to count the occurrences in each line, still using LINQ to process the lines and sum the counts:

Console.WriteLine("Please enter a filename:");
var theFile = Console.ReadLine();

var delim = " ,.".ToCharArray();
var countWords = new[] { "Jude" };
var wordPattern = new Regex(@"\b(?:"+String.Join("|", countWords)+@")\b", RegexOptions.Compiled|RegexOptions.IgnoreCase);
var count = File.ReadLines(theFile).Select(l => wordPattern.Matches(l).Count).Sum();
Console.WriteLine("The word count is {0}", count);

Upvotes: 2

Dai
Dai

Reputation: 155250

  • Avoid new object allocations inside tight loops, in particular:
    • Don't use String.Split() as it causes excess string allocation
    • Also avoid calling ToCharArray() too - you can just cache the results.
  • Use using() to ensure IDisposable objects are always disposed.

I recommend using a Regex instead:

Regex regex = new Regex( @"\bJude\b", RegexOptions.Compiled | RegexOptions.IgnoreCase );

Int32 count = 0;
using( StreamReader rdr = new StreamReader( theFile ) )
{
    String line;
    while( ( line = rdr.ReadLine() ) != null )
    {
        count += regex.Matches( line ).Count;
    } 
}

The \b escape matches a "word-boundary", such as the start and end of strings and punctuation, so it will match "Jude" in the following examples: "Jude", "Jude foo", "Foo Jude", "Hello. Jude." but not "JudeJude".

Upvotes: 1

Related Questions