Christian Bongiorno
Christian Bongiorno

Reputation: 5648

Invalid Regex is accepted by Java. Is this a Java bug or a missed interpretation of expectations

This pattern is not a valid regex according to several websites

groovy:000> java.util.regex.Pattern.compile("^*");
===> ^*

But the same expression in node correctly understands this:

$ node
> new RegExp('^*')
SyntaxError: Invalid regular expression: /^*/: Nothing to repeat

Who's right here? Java, node/internet? Or, am I just expecting something from the Java libs that I shouldn't

Upvotes: 1

Views: 100

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170308

I'd say the links to the regex test tools are wrong (in the PCRE sense of it). I think this is so because of JS implementations handle these matches differently (see: https://github.com/gskinner/regexr/issues/28)

Note that both regexr and regex101 accept ^()* and (^)*. Also, Perl v5.18.2 has no issue with it: running echo "ubar" | perl -ne "s/^*/F/; print;" from my terminal results in no warnings or errors and will print Fubar.

This is what the PCRE specification say:

It is possible to construct infinite loops by following a subpattern that can match no characters with a quantifier that has no upper limit, for example:

(a?)*

Earlier versions of Perl and PCRE used to give an error at compile time for such patterns. However, because there are cases where this can be useful, such patterns are now accepted, but if any repetition of the subpattern does in fact match no characters, the loop is forcibly broken.

-- https://www.pcre.org/original/doc/html/pcrepattern.html

So, matching infinite amounts of zero-width matches, like ^* does, is accepted by the specs.

Upvotes: 3

Related Questions