Thomas
Thomas

Reputation: 211

Java Scanner Regex InputMistmastchException

I've read a few other posts put can't figure out inherently what is wrong with the following code and why does it throw an exception:

public static void main(String[] args) {

    Scanner scanner = new Scanner(" 0 B 2 # L");

    String first = scanner.next("[0-9]+ [abB#]");
    String second = scanner.next("[0-9]+ [abB#] [LR]");

    System.out.println(first); 
    System.out.println(second);

}

Expected output:

0 B
2 # L

Upvotes: 1

Views: 66

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626774

When using Scanner, it delimits the whole input. In your case, it delimits with whitespace (default):

A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace.

@Andreas described the root cause of the exception.

Now, the solution is to use methods that disregard delimiters.

The findInLine(java.lang.String), findWithinHorizon(java.lang.String, int), and skip(java.util.regex.Pattern) methods operate independently of the delimiter pattern. These methods will attempt to match the specified pattern with no regard to delimiters in the input and thus can be used in special circumstances where delimiters are not relevant.

You can use Scanner.findInLine:

Scanner s = new Scanner(" 0 B 2 # L");
String first = s.findInLine("[0-9]+ [abB#]"); // => 0 B
System.out.println(first);
String second = s.findInLine("[0-9]+ [abB#] [LR]"); // => 2 # L
System.out.println(second);
s.close(); 

See IDEONE demo

A simple Matcher regex solution:

String str = " 0 B 2 # L";
Pattern ptrn = Pattern.compile("[0-9]+ [abB#] [LR]|[0-9]+ [abB#]");
Matcher matcher = ptrn.matcher(str);
while (matcher.find()) {
    System.out.println(matcher.group(0));
}

See IDEONE demo

The longer alternative must come before the shorter one to obtain correct matches.

Upvotes: 2

Andreas
Andreas

Reputation: 159086

Quoting javadoc of Scanner.next(String):

Returns the next token if it matches the pattern constructed from the specified string.

By default (and you didn't change that), the Scanner will split tokens on white-space, so the tokens returned by the Scanner are:

"0"
"B"
"2"
"#"
"L"

The first next token is "0" and that doesn't match "[0-9]+ [abB#]", so you get InputMismatchException.

Solution

Don't use a Scanner.

Pattern p = Pattern.compile(" *([0-9]+ [abB#]) +([0-9]+ [abB#] [LR]) *");
Matcher m = p.matcher(" 0 B 2 # L");
if (m.matches()) {
    System.out.println(m.group(1));
    System.out.println(m.group(2));
}

Upvotes: 2

Related Questions