Reputation: 1
It is my first time I'm using regular expressions and I have some problems.
I'm writing a simple compiler program, and now I'm working on a "parsing" module which takes some assembler line and splits it to parts.
Some part of the line may consist of one of those expressions:
String comp = "[(0)(1)(-1)(D)(A)(!D)(!A)(-D)(-A)(D+1)(A+1)(D-1)(A-1)(D+A)(D-A)(A-D)(D&A)(D|A)(M)(M+1)(M-1)(D+M)(D-M)(M-D)(D&M)(D|M)]";
So for now on I just want to see which expression matches the following regular expression, because that's what I need for now.
Java compiler doesn't compile such an expression and writes that:
Illegal character range near index 46 [(0)(1)(-1)(D)(A)(!D)(!A)(-D)(-A)(D+1)(A+1)(D-1)(A-1)(D+A)(D-A)(A-D)(D&A)(D|A)(M)(M+1)(M-1)(D+M)(D-M)(M-D)(D&M)(D|M)]
I tried to do it like that:
String comp = "[(0)(1)(\\-1)(D)(A)(!D)(!A)(\\-D)(\\-A)(D+1)(A+1)(D\\-1)(A\\-1)(D+A)(D\\-A)(A-D)(D&A)(D|A)(M)(M+1)(M\\-1)(D+M)(D\\-M)(M\\-D)(D&M)(D|M)]";
That makes the program compile, but it find a match for strings like "D" or "1" but not for "D+1" or "D-1", what is the problem and how can I fix it?
Upvotes: 0
Views: 985
Reputation: 170298
When you wrap square brackets around (a part of) your regex, it becomes a character set (or character class). A character set always matches just one character. So your regex:
[(0)(1)(-1)(D)(A)(!D)(!A)(-D)(-A)(D+1)(A+1)(D-1)(A-1)(D+A)(D-A)(A-D)(D&A)(D|A)(M)(M+1)(M-1)(D+M)(D-M)(M-D)(D&M)(D|M)]
matches just one of:
'(', '0', ')', '1', '-', ... , '+', ...
Also notice that meta characters like (
, )
and +
have no special meaning inside character sets. A character set has its own meta characters, like -
, which is used to denote a range. For example, [a-c]
matches either a
, b
or c
.
That is why you can't use the -
in your regex, which shouldn't be a character set, of course.
More info about character sets: http://www.regular-expressions.info/charclass.html
Upvotes: 3
Reputation: 533870
The problem appears to be that you cannot use multiple characters inside a () inside a [] this way. It appears to turn the string in () into individual characters.
public static void main(String... args) {
test("[(C)(D1)]", "D"); // true!
test("[(D)(D1)]", "D1");
test("((D)|(D1))", "D1");
test("[(D)(D+1)]", "D+1"); // false
test("[(D)(D+1)]", "+"); // true!
test("[(D)(D\\+1)]", "D+1");
test("((D)|(D\\+1))", "D+1");
}
private static void test(String regex, String text) {
Pattern pattern = Pattern.compile("^"+regex+"$");
System.out.println(regex +" matches "+text+" is " + pattern.matcher(text).find()) ;
}
prints
[(C)(D1)] matches D is true
[(D)(D1)] matches D1 is false
((D)|(D1)) matches D1 is true
[(D)(D+1)] matches D+1 is false
[(D)(D+1)] matches + is true
[(D)(D\+1)] matches D+1 is false
((D)|(D\+1)) matches D+1 is true
Upvotes: 0