Reputation: 89
public static void main(String args[]) throws FileNotFoundException
{
String inputFileName = "textfile.txt";
printFileStats(inputFileName);
}
public static void printFileStats(String fileName) throws FileNotFoundException
{
String outputFileName = "outputtextfile.txt";
File inputFile = new File(fileName);
Scanner in = new Scanner(inputFile);
PrintWriter out = new PrintWriter(outputFileName);
int lines = 0;
int words = 0;
int characters = 0;
while(in.hasNextLine())
{
lines++;
while(in.hasNext())
{
in.next();
words++;
}
}
out.println("Lines: " + lines);
out.println("Words: " + words);
out.println("Characters: " + characters);
in.close();
out.close();
}
I have a text file containing five lines
this is
a text
file
full of stuff
and lines
The code creates an output file
Lines: 1
Words: 10
Characters: 0
However, if I remove the capability for reading the number of words in the file, it correctly states the number of lines (5). Why is this happening?
Upvotes: 1
Views: 121
Reputation: 20889
The reason is, that hasNext()
does not care about line breaks.
So, you are entering the while(in.hasNextLine())
loop, but then you are consuming the whole file with the while(in.hasNext())
loop, resulting in 1 line and 10 words.
-> Check the token consumed by hasNext()
for EOL-Characters, then increase line count.
OR:
Use String line = scanner.nextLine()
to obtain exactly ONE line, and then use a second scanner to fetch all tokens of that line: scanner2 = new Scanner(line); while(scanner2.hasNext())
Upvotes: 0
Reputation: 1553
Your inner while
loop is gobbling up the whole file. You want to count the number of words in each line, right? Try this instead:
while (in.hasNextLine())
{
lines++;
String line = in.nextLine();
for (String word : line.split("\\s"))
{
words++;
}
}
Note that splitting on spaces is a very naive approach to tokenization (word-splitting) and will only work for simple examples like the one you have here.
Of course, you could also do words += line.split("\\s").length;
instead of that inner loop.
Upvotes: 4
Reputation: 33019
in.hasNext()
and in.next()
treat all whitespace characters as word separators, including newline characters. Your inner loop is eating all the newlines as it's counting all the words.
Upvotes: 1
Reputation: 23029
This reads next Token
, not the line :
in.next();
So it just read next and next and next and dont care about line ending. Space or \n
is considered as white space
usually, so methods like this one does not make any difference between them.
Upvotes: 0