Arkadi
Arkadi

Reputation: 1

Can't use '-' character in a Java regular expression, it doesn't find the pattern in the given text

It is my first time I'm using regular expressions and I have some problems.

I'm writing a simple compiler program, and now I'm working on a "parsing" module which takes some assembler line and splits it to parts.

Some part of the line may consist of one of those expressions:

String comp = "[(0)(1)(-1)(D)(A)(!D)(!A)(-D)(-A)(D+1)(A+1)(D-1)(A-1)(D+A)(D-A)(A-D)(D&A)(D|A)(M)(M+1)(M-1)(D+M)(D-M)(M-D)(D&M)(D|M)]";

So for now on I just want to see which expression matches the following regular expression, because that's what I need for now.

Java compiler doesn't compile such an expression and writes that:

Illegal character range near index 46 [(0)(1)(-1)(D)(A)(!D)(!A)(-D)(-A)(D+1)(A+1)(D-1)(A-1)(D+A)(D-A)(A-D)(D&A)(D|A)(M)(M+1)(M-1)(D+M)(D-M)(M-D)(D&M)(D|M)]

I tried to do it like that:

    String comp = "[(0)(1)(\\-1)(D)(A)(!D)(!A)(\\-D)(\\-A)(D+1)(A+1)(D\\-1)(A\\-1)(D+A)(D\\-A)(A-D)(D&A)(D|A)(M)(M+1)(M\\-1)(D+M)(D\\-M)(M\\-D)(D&M)(D|M)]";

That makes the program compile, but it find a match for strings like "D" or "1" but not for "D+1" or "D-1", what is the problem and how can I fix it?

Upvotes: 0

Views: 985

Answers (2)

Bart Kiers
Bart Kiers

Reputation: 170298

When you wrap square brackets around (a part of) your regex, it becomes a character set (or character class). A character set always matches just one character. So your regex:

[(0)(1)(-1)(D)(A)(!D)(!A)(-D)(-A)(D+1)(A+1)(D-1)(A-1)(D+A)(D-A)(A-D)(D&A)(D|A)(M)(M+1)(M-1)(D+M)(D-M)(M-D)(D&M)(D|M)]

matches just one of:

'(', '0', ')', '1', '-', ... , '+', ...

Also notice that meta characters like (, ) and + have no special meaning inside character sets. A character set has its own meta characters, like -, which is used to denote a range. For example, [a-c] matches either a, b or c.

That is why you can't use the - in your regex, which shouldn't be a character set, of course.

More info about character sets: http://www.regular-expressions.info/charclass.html

Upvotes: 3

Peter Lawrey
Peter Lawrey

Reputation: 533870

The problem appears to be that you cannot use multiple characters inside a () inside a [] this way. It appears to turn the string in () into individual characters.

public static void main(String... args) {
    test("[(C)(D1)]", "D"); // true!
    test("[(D)(D1)]", "D1");
    test("((D)|(D1))", "D1");
    test("[(D)(D+1)]", "D+1"); // false
    test("[(D)(D+1)]", "+"); // true!
    test("[(D)(D\\+1)]", "D+1");
    test("((D)|(D\\+1))", "D+1");
}

private static void test(String regex, String text) {
    Pattern pattern = Pattern.compile("^"+regex+"$");
    System.out.println(regex +" matches "+text+" is " + pattern.matcher(text).find()) ;
}

prints

[(C)(D1)] matches D is true
[(D)(D1)] matches D1 is false
((D)|(D1)) matches D1 is true
[(D)(D+1)] matches D+1 is false
[(D)(D+1)] matches + is true
[(D)(D\+1)] matches D+1 is false
((D)|(D\+1)) matches D+1 is true

Upvotes: 0

Related Questions