Reputation: 10548
In this Java code:
public class Main {
public static void main(String[] args) {
"".matches("(?<!((.{0,1}){0,1}))");
}
}
the compiler (I'm using JVM 1.6.0_17-b04) shouts "Exception ... Look-behind group does not have an obvious maximum length". I saw here that:
Java takes things a step further by allowing finite repetition. You still cannot use the star or plus, but you can use the question mark and the curly braces with the max parameter specified. Java recognizes the fact that finite repetition can be rewritten as an alternation of strings with different, but fixed lengths.
But... in the code above there is very obvious finite maximum length - 1 (simple product).
The real problem is, of course, in more complex patterns, like:
(?<!bad(\s{1,99}(\S{1,99}\s{1,99}){0,6}))good
(good word, that has no bad word behind, in 7-words-range).
How can I fix it?
Upvotes: 2
Views: 1421
Reputation: 3364
If you remove the capture groups from the negative look-behind then it seems to compile. I'm not even sure what the intent was or what the capture groups should be doing in a negative look-behind. Is that intentional?
Edit to clarify:
You wrote the regex:
"(?<!((.{0,1}){0,1}))"
The "(?<!"
part indicates a negative look-behind as in you want to find matches where this doesn't happen before it. Yet, it is chock full of capture groups... ie: all of those naked ()
. Which doesn't make any sense since those can't possibly capture anything since it's a negative look behind. (In case you aren't fluent in regex, capture groups are used to pull specific sub-ranges of the match after the match has happened.)
Take all of those parentheses out and you will no longer get the error... not to mention that they are unnecessary:
"(?<!.{0,1}{0,1})"
The above part will work without error, for example. If you really need parentheses in negative look behind then you should use non-capturing groups like "(?:mypattern)". In this simple example they don't really do anything for you either way and the double {0,1} is a bit redundant.
Edit 2:
So I tried to get your more complicated example to work and even switching to non-capturing groups doesn't get rid of Java regex's confusion. The only way to work-around it seems to be to get rid of the {0,6} as suggested in comments.
For example, this will compile:
"(?<!bad(?:\\s{1,99}(?:\\S{1,99}\\s{1,99})?(?:\\S{1,99}\\s{1,99})?(?:\\S{1,99}\\s{1,99})?(?:\\S{1,99}\\s{1,99})?(?:\\S{1,99}\\s{1,99})?(?:\\S{1,99}\\s{1,99})?))good"
...and do the same thing but it's a lot uglier.
This may be a case where regex is not the complete answer but just part of a larger solution that requires more than one pass.
Upvotes: 3