Hifza Rahim
Hifza Rahim

Reputation: 41

Word Count from a text file using Java

I am trying to write a simple code that will give me the word count from a text file. The code is as follows:

import java.io.File; //to read file
import java.util.Scanner;

public class ReadTextFile {
   public static void main(String[] args) throws Exception { 
      String filename = "textfile.txt";
      File f = new File (filename);
      Scanner scan = new Scanner(f);
      int wordCnt = 1;

      while(scan.hasNextLine()) {
          String text = scan.nextLine();
          for (int i = 0; i < text.length(); i++) {
              if(text.charAt(i) == ' ' && text.charAt(i-1) != ' ') {
                  wordCnt++;
              }
          }
      }
      System.out.println("Word count is " + wordCnt);
   }

}

this code compiles but does not give the correct word count. What am I doing incorrectly?

Upvotes: 1

Views: 2425

Answers (2)

jker
jker

Reputation: 465

First of all remember about closing resources. Please check this out.

Since Java 8 you can count words in this way:

String regex = "\\s+"
String filename = "textfile.txt";

File f = new File (filename);

long wordCnt = 1;
try (var scanner = new Scanner (f)){
        wordCnt scanner.lines().map(str -> str.split(regex)).count();
} catch (IOException e) {
        e.printStackTrace();
}



System.out.println("Word count is " + wordCnt);

Upvotes: 0

GBlodgett
GBlodgett

Reputation: 12819

Right now you are only incrementing wordCnt if the character you are on is a whitespace and the character before it is not. However this discounts several cases, such as if there is not a space, but a newline character. Consider if your file looked like:

This is a text file\n
with a bunch of\n
words. 

Your method should return ten, but since there is not space after the words file, and of it will not count them as words.

If you just want the word count you can do something along the lines of:

while(scan.hasNextLine()){
   String text = scan.nextLine();
   wordCnt+= text.split("\\s+").length;  
}

Which will split on white space(s), and return how many tokens are in the resulting Array

Upvotes: 2

Related Questions