Kazekage Gaara
Kazekage Gaara

Reputation: 15052

counting the occurence of a particular String in a file

Here's the code that I've worked upon:

while ((lineContents = tempFileReader.readLine()) != null)
{
            String lineByLine = lineContents.replaceAll("/\\.", System.getProperty("line.separator")); //for matching /. and replacing it by new line
            changer.write(lineByLine);
            Pattern pattern = Pattern.compile("\\r?\\n"); //Find new line
            Matcher matcher = pattern.matcher(lineByLine);
            while(matcher.find())
            {
                Pattern tagFinder = Pattern.compile("word"); //Finding the word required
                Matcher tagMatcher = tagFinder.matcher(lineByLine);
                while(tagMatcher.find())
                {
                    score++;
                }
                scoreTracker.add(score);
                    score = 0;
            }   
}

My sample input contains 6 lines, with occurences of word being [0,1,0,3,0,0] So when I print scoreTracker (which is an ArrayList) I want the above output. But instead, I get [4,4,4,4,4,4] which it the total occurence of the word, but not line by line. Kindly help.

Upvotes: 2

Views: 1230

Answers (5)

AlanS
AlanS

Reputation: 61

The original code was reading the input one line at a time using tempFileReader.readLine() and then looking for end of lines within each line using matcher. Since lineContents contains only one line, matcher never finds a new line so the rest of the code is skipped. Why do you need two different bits of code to split the input into lines? You could remove one of the bits of code relating to finding the new lines. E.g.

while ((lineContents = tempFileReader.readLine()) != null)
{
      Pattern tagFinder = Pattern.compile("word"); //Finding the word required
      Matcher tagMatcher = tagFinder.matcher(lineContents);
      while(tagMatcher.find())
      {
          score++;
      }
      scoreTracker.add(score);
      score = 0;

}

I've tried the code above using a file test.txt on Windows read by a BufferedReader. E.g.

BufferedReader tempFileReader = new BufferedReader(new FileReader("c:\\test\\test.txt"));

scoreTracker contains [0, 1, 0, 3, 0, 0] for a file which has the content you describe. I don't understand how you got [4,4,4,4,4,4] out of the original code if the sample input is an actual file as described and tempFileReader is a BufferedReader. It would be useful to see the code you use to set up tempFileReader.

Upvotes: 1

Narendra Yadala
Narendra Yadala

Reputation: 9664

lineByLine points to the entire contents of your file. That is the reason you are getting [4,4,4,4,4,4]. You need to store each line in another variable line and then use tagFinder.find(line). Final code will look like this

while ((lineContents = tempFileReader.readLine()) != null)
{
    String lineByLine = lineContents.replaceAll("/\\.", System.getProperty("line.separator")); //for matching /. and replacing it by new line
    changer.write(lineByLine);
    Pattern pattern = Pattern.compile(".*\\r?\\n"); //Find new line
    Matcher matcher = pattern.matcher(lineByLine);
    while(matcher.find())
    {
        Pattern tagFinder = Pattern.compile("word"); //Finding the word required
        //matcher.group() returns the input subsequence matched by the previous match.
        Matcher tagMatcher = tagFinder.matcher(matcher.group());
        while(tagMatcher.find())
        {
            score++;
        }
        scoreTracker.add(score);
            score = 0;
    }   
}

Upvotes: 3

Jakub Zaverka
Jakub Zaverka

Reputation: 8874

You can use Scanner class. You initialize the Scanner to the string you want to count and then just count how many these tokens Scanner finds.

And you can initialize Scanner directly with the FileInputStream.

The resulting code has only 9 lines:

File file = new File(fileName);
Scanner scanner = new Scanner(file);
scanner.useDelimiter("your text here");
int occurences;
while(scanner.hasNext()){
     scanner.next();
     occurences++;
}
scanner.close();

Upvotes: 0

Untitled
Untitled

Reputation: 790

This is because each time you are searching the same string (lineByLine). what you probably intended was to search each line separately. I suggest you do:

    Pattern tagFinder = Pattern.compile("word"); //Finding the word required
    for(String line : lineByLine.split("\\n")
    {
        Matcher tagMatcher = tagFinder.matcher(line);
        while(tagMatcher.find())
            score++;
        scoreTracker.add(score);
        score = 0;
    }

Upvotes: 1

Boris Strandjev
Boris Strandjev

Reputation: 46943

Maybe this code will help you:

    String str = "word word\n \n word word\n \n word\n";
    Pattern pattern = Pattern.compile("(.*)\\r?\\n"); //Find new line
    Matcher matcher = pattern.matcher(str);
    while(matcher.find())
    {
        Pattern tagFinder = Pattern.compile("word"); //Finding the word required
        Matcher tagMatcher = tagFinder.matcher(matcher.group());
        int score = 0;
        while(tagMatcher.find())
        {
            score++;
        }
        System.out.print(score + " ");
    }

The output is 2 0 2 0 1 It is not highly optimized, but your problem was that you never restricted the inner matching and it always scanned the whole line.

Upvotes: 1

Related Questions