Reputation: 14678
I'm trying to match the string iso_schematron_skeleton_for_xslt1.xsl
against the regexp ([a-zA-Z|_])?(\w+|_|\.|-)+(@\d{4}-\d{2}-\d{2})?\.yang
.
The expected result is false
, it should not match.
The problem is that the call to matcher.matches()
never returns.
Is this a bug in the Java regexp implementation?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HelloWorld{
private static final Pattern YANG_MODULE_RE = Pattern
.compile("([a-zA-Z|_])?(\\w+|_|\\.|-)+(@\\d{4}-\\d{2}-\\d{2})?\\.yang");
public static void main(String []args){
final Matcher matcher = YANG_MODULE_RE.matcher("iso_schematron_skeleton_for_xslt1.xsl");
System.out.println(Boolean.toString( matcher.matches()));
}
}
I'm using:
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b15)
OpenJDK 64-Bit Server VM (build 25.181-b15, mixed mode)
Upvotes: 10
Views: 379
Reputation: 627103
The pattern contains nested quantifiers. The \w+
is inside a group that is itself quantified with +
, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\\w+|_|\\.|-)+
=> [\\w.-]+
.
Note that \w
already matches _
. Also, a |
inside a character class matches a literal |
char, and [a|b]
matches a
, |
or b
, so it seems you should remove the |
from your first character class.
Use
.compile("[a-zA-Z_]?[\\w.-]+(?:@\\d{4}-\\d{2}-\\d{2})?\\.yang")
Note that you may use a non-capturing group ((?:...)
) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.
See the regex demo (as the pattern is used with matches()
and thus requires a full string match, I added ^
and $
in the regex demo).
Upvotes: 9