ganesh
ganesh

Reputation: 1

java regex pattern matching vs schema validation

Please consider the regex pattern : .*[a-zA-Z0-9\\-\\_].*.
If I use Java regex pattern matching to match "-", it says it is true.

String regexCostcode1=".*[a-zA-Z0-9\\-\\_].*";
Pattern regex_costcode=Pattern.compile(regexCostcode1);
String test="-";
Matcher m = regex_costcode.matcher(test);
System.out.println(m.matches());

This prints true.
But same regex fails for "-" in XSD schema validation.

I checked using http://regexr.com/ it fails to match "-".

So why it is matching using Java pattern matching?

Upvotes: 0

Views: 738

Answers (2)

walen
walen

Reputation: 7273

For non-Java regexes you don't need to use double back-slashes. So your regex should be .*[a-zA-Z0-9\\-\\_].* in Java and .*[a-zA-Z0-9\-\_].* in XSD schema validation.

If you input .*[a-zA-Z0-9\\-\\_].* in the site you mentioned, it tells you that \\-\\ is being interpreted as a "range of characters from \ to \" since \\ is just an escaped back-slash.
If you input .*[a-zA-Z0-9\-\_].* it interprets \- as just an escaped hypen and correctly matches -.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627341

Mind that in a Java string literal you need 2 backslashes to define a literal backslash. When you use \\ at the regexr.com, or in XML Schema regex, you use 2 literal backslashes that match a literal backslash in the input string, and the [\\-\\] construct matches a single \.

In XML Schema, you need to define the regex as

<xs:pattern value=".*[a-zA-Z0-9_-].*"/>

Put the - at the end of the character class to be parsed as a literal -. The underscore does not need to be escaped at all, as it is never a special char (it is actually a "word" char).

Actually, I'd advise to use ".*[a-zA-Z0-9_-].*" in Java, too, to avoid any ambiguity.

Upvotes: 2

Related Questions