Reputation: 3128
In Java, I was unable to get a regex to behave the way I wanted, and wrote this little JUnit test to demonstrate the problem:
public void testLookahead() throws Exception {
Pattern p = Pattern.compile("ABC(?!!)");
assertTrue(p.matcher("ABC").find());
assertTrue(p.matcher("ABCx").find());
assertFalse(p.matcher("ABC!").find());
assertFalse(p.matcher("ABC!x").find());
assertFalse(p.matcher("blah/ABC!/blah").find());
p = Pattern.compile("[A-Z]{3}(?!!)");
assertTrue(p.matcher("ABC").find());
assertTrue(p.matcher("ABCx").find());
assertFalse(p.matcher("ABC!").find());
assertFalse(p.matcher("ABC!x").find());
assertFalse(p.matcher("blah/ABC!/blah").find());
p = Pattern.compile("[A-Z]{3}(?!!)", Pattern.CASE_INSENSITIVE);
assertTrue(p.matcher("ABC").find());
assertTrue(p.matcher("ABCx").find());
assertFalse(p.matcher("ABC!").find());
assertFalse(p.matcher("ABC!x").find());
assertFalse(p.matcher("blah/ABC!/blah").find()); //fails, why?
p = Pattern.compile("[A-Za-z]{3}(?!!)");
assertTrue(p.matcher("ABC").find());
assertTrue(p.matcher("ABCx").find());
assertFalse(p.matcher("ABC!").find());
assertFalse(p.matcher("ABC!x").find());
assertFalse(p.matcher("blah/ABC!/blah").find()); //fails, why?
}
Every line passes except for the two marked with the comment. The groupings are identical except for pattern string. Why would adding case-insensitivity break the matcher?
Upvotes: 1
Views: 401
Reputation: 170148
Your tests fail, because in both cases, the pattern [A-Z]{3}(?!!)
(with CASE_INSENSITIVE
) and [A-Za-z]{3}(?!!)
find at least one match in "blah/ABC!/blah"
(they find bla
twice).
A simple tests shows this:
Pattern p = Pattern.compile("[A-Z]{3}(?!!)", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("blah/ABC!/blah");
while(m.find()) {
System.out.println(m.group());
}
prints:
bla
bla
Upvotes: 1
Reputation: 35790
Those two don't throw false values because there are substrings within the full string that match the pattern. Specifically, the string blah
matches the regular expression (three letters not followed by an exclamation mark). The case-sensitive ones correctly fail because blah
isn't upper-case.
Upvotes: 1