Reputation: 41
I am trying to write a simple code that will give me the word count from a text file. The code is as follows:
import java.io.File; //to read file
import java.util.Scanner;
public class ReadTextFile {
public static void main(String[] args) throws Exception {
String filename = "textfile.txt";
File f = new File (filename);
Scanner scan = new Scanner(f);
int wordCnt = 1;
while(scan.hasNextLine()) {
String text = scan.nextLine();
for (int i = 0; i < text.length(); i++) {
if(text.charAt(i) == ' ' && text.charAt(i-1) != ' ') {
wordCnt++;
}
}
}
System.out.println("Word count is " + wordCnt);
}
}
this code compiles but does not give the correct word count. What am I doing incorrectly?
Upvotes: 1
Views: 2425
Reputation: 465
First of all remember about closing resources. Please check this out.
Since Java 8 you can count words in this way:
String regex = "\\s+"
String filename = "textfile.txt";
File f = new File (filename);
long wordCnt = 1;
try (var scanner = new Scanner (f)){
wordCnt scanner.lines().map(str -> str.split(regex)).count();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("Word count is " + wordCnt);
Upvotes: 0
Reputation: 12819
Right now you are only incrementing wordCnt
if the character you are on is a whitespace and the character before it is not. However this discounts several cases, such as if there is not a space, but a newline character. Consider if your file looked like:
This is a text file\n
with a bunch of\n
words.
Your method should return ten, but since there is not space after the words file
, and of
it will not count them as words.
If you just want the word count you can do something along the lines of:
while(scan.hasNextLine()){
String text = scan.nextLine();
wordCnt+= text.split("\\s+").length;
}
Which will split on white space(s), and return how many tokens are in the resulting Array
Upvotes: 2