user1505713
user1505713

Reputation: 607

java.util.Scanner: Behavior with whitespace-only input?

I tried to write a small parser for a Java course: the parser uses the Scanner class:

import java.util.Scanner;
import java.io.*;

public class WC1 {

    public static void main(String[] args) throws Exception{

        File f = new File(args[0]);
        Scanner in = new Scanner(f);

        int c=0, w=0, l=0;

        while (in.hasNext()) {
            String line = in.nextLine();
            int N = line.length();

            boolean word = false;

            for (int i=0;i<N;i++) {
                char ch = line.charAt(i);
                if (ch == '\r' || ch=='\n') {
                    if (word == true) w++;
                    word = false; // do nothing
                }
                else if (ch == ' ' || ch == '\t') {
                    if (word == true) w++;
                    word = false;
                    c++;
                }
                else {
                    word = true;
                    c++;
                }
            }

            if (word == true) w++;
            word = false; // scanner consumes newline but does not return it
            c++; // scanner throws away the newline
            l++;
            System.out.println(line);
        }

        in.close();
        System.out.println("" + c + " characters");
        System.out.println("" + w + " words");
        System.out.println("" + l + " lines");
    }

}

File1:

I tested it with the three small input files below:

The reason for the exception is that you are calling keyIn.close() after you use the scanner once, which not only closes the Scanner but also System.in. The very next iteration you create a new Scanner which promptly blows up because System.in is now closed. To fix that, what you should do is only create a scanner once before you enter the while loop, and skip the close() call entirely since you don't want to close System.in.

After fixing that the program still won't work because of the == and != string comparisons you do. When comparing strings in Java you must use equals() to compare the string contents. When you use == and != you are comparing the object references, so these comparisons will always return false in your code. Always use equals() to compare strings.

java MyClass File1.dat

779 characters
136 words
3 lines

wc File1.dat

3     136     779 test.dat

File2:

cat
dog
goose chicken
 rat
dragon

crab

java MyClass File2.dat

47 characters
7 words
7 lines

wc File2.dat

7     7     47 File2.dat

But this doesn't work:

File3:

       |
      |
     |
    |
   |
  |
 |
|

java MyClass File3.dat

0 characters
0 words
0 lines

wc File3.dat

8     0     36 File3.dat

File 3 is composed of spaces and newline characters only: the pipe symbol denotes the end of a line.

What is happening here? Notice the empty line in File2. Why is the Scanner seemingly ignoring the spaces in File3?

Upvotes: 1

Views: 455

Answers (2)

Radiodef
Radiodef

Reputation: 37875

while (in.hasNext()) {
    String line = in.nextLine();

Here you're checking that the Scanner hasNext but advancing with nextLine. These are basically unrelated. And the result you've found is that your 3rd file has no tokens (non-whitespace delimited by whitespace) but it has lines. You should always check hasXXX with the method of advancing you are actually using, in your case:

while (in.hasNextLine()) {
    String line = in.nextLine();

Upvotes: 0

stevietheTV
stevietheTV

Reputation: 512

A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types using the various next methods.

http://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html

Upvotes: 0

Related Questions