bitcasual
bitcasual

Reputation: 83

java regex matcher exception on unknown character

So I have a String I want to split into tokens of different types as part of a larger Parser.

String input = "45 + 31.05 * 110 @ 54";

I use javas regex libraries Pattern and Matcher to interpret my regexes and find matches.

String floatRegex = "[0-9]+(\\.([0-9])+)?";
String additionRegex = "[+]";
String multiplicationRegex = "[*]";
String integerRegex = "[0-9]+"

All my regexes gets merged into a single master regex with pipe symbols between the different regexes.

String masterOfRegexes = "[0-9]+(\\.([0-9])+)?|[+]|[*]|[0-9]+"

I send this pattern into Pattern.compile() and get the matcher. As I step though from left to right running matcher.find(), I expect to get this structure out, up to the point of the "@" symbol where an InvalidInputException should be thrown.

[
  ["Integer": "45"],
  ["addition": "+"],
  ["Float": "31.05"],
  ["multiplication": "*"],
  ["Integer": "110"]
  Exception should be thrown...
]

Problem is that matcher.find() skips the "@" symbol completely and instead find the match of the next Integer past "@", which is "54".

Why does it skip the "@" symbol and how can I make it so the exception gets thrown on a character it doesn't recognize from my pattern?

Upvotes: 1

Views: 435

Answers (2)

Joop Eggen
Joop Eggen

Reputation: 109557

Matcher knows:

  • matches: matching all, the entire input
  • find: somewhere in the input
  • lookingAt: from start, but not necessarily to the end

Your use of find skipped the "@". Use the rare lookingAt, or check the find start/end positions.

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163362

A regex matches or it does not match. In your example data, it does not skip over the @, it just does not match it.

What you could do is identify the valid matches in a single capture group, and when looping though the matches check if group 1 is not null.

If it is not, then the pattern has a valid group 1 match, else you can throw your Exception.

See a regex demo and a Java demo.

String regex = "([0-9]+(?:\\.[0-9]+)?|[+]|[*]|[0-9]+)|\\S+";
String string = "45 + 31.05 * 110 @ 54";

Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    if (matcher.group(1) == null) {
        // your Exception here
        // throw new Exception("No match!");
        System.out.println(matcher.group() + " -> no match");
    } else {
        System.out.println(matcher.group(1) + " -> match");
    }
}

Output

45 -> match
+ -> match
31.05 -> match
* -> match
110 -> match
@ -> no match
54 -> match

Upvotes: 2

Related Questions