romacafe
romacafe

Reputation: 3128

Java RegEx with lookahead failing

In Java, I was unable to get a regex to behave the way I wanted, and wrote this little JUnit test to demonstrate the problem:

public void testLookahead() throws Exception {
    Pattern p = Pattern.compile("ABC(?!!)");
    assertTrue(p.matcher("ABC").find());
    assertTrue(p.matcher("ABCx").find());
    assertFalse(p.matcher("ABC!").find());
    assertFalse(p.matcher("ABC!x").find());
    assertFalse(p.matcher("blah/ABC!/blah").find());

    p = Pattern.compile("[A-Z]{3}(?!!)");
    assertTrue(p.matcher("ABC").find());
    assertTrue(p.matcher("ABCx").find());
    assertFalse(p.matcher("ABC!").find());
    assertFalse(p.matcher("ABC!x").find());
    assertFalse(p.matcher("blah/ABC!/blah").find());

    p = Pattern.compile("[A-Z]{3}(?!!)", Pattern.CASE_INSENSITIVE);
    assertTrue(p.matcher("ABC").find());
    assertTrue(p.matcher("ABCx").find());
    assertFalse(p.matcher("ABC!").find());
    assertFalse(p.matcher("ABC!x").find());
    assertFalse(p.matcher("blah/ABC!/blah").find()); //fails, why?

    p = Pattern.compile("[A-Za-z]{3}(?!!)");
    assertTrue(p.matcher("ABC").find());
    assertTrue(p.matcher("ABCx").find());
    assertFalse(p.matcher("ABC!").find());
    assertFalse(p.matcher("ABC!x").find());
    assertFalse(p.matcher("blah/ABC!/blah").find());  //fails, why?
}

Every line passes except for the two marked with the comment. The groupings are identical except for pattern string. Why would adding case-insensitivity break the matcher?

Upvotes: 1

Views: 401

Answers (2)

Bart Kiers
Bart Kiers

Reputation: 170148

Your tests fail, because in both cases, the pattern [A-Z]{3}(?!!) (with CASE_INSENSITIVE) and [A-Za-z]{3}(?!!) find at least one match in "blah/ABC!/blah" (they find bla twice).

A simple tests shows this:

Pattern p = Pattern.compile("[A-Z]{3}(?!!)", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("blah/ABC!/blah");
while(m.find()) {
    System.out.println(m.group());
}

prints:

bla
bla

Upvotes: 1

CanSpice
CanSpice

Reputation: 35790

Those two don't throw false values because there are substrings within the full string that match the pattern. Specifically, the string blah matches the regular expression (three letters not followed by an exclamation mark). The case-sensitive ones correctly fail because blah isn't upper-case.

Upvotes: 1

Related Questions