Regular expression works with java.util.regex.Pattern but not com.oroinc.text.regex.Perl5Matcher

Question

I came across a bug today in our legacy code which was using the Perl5Compiler and Perl5Matcher using the following regular expression to validate a UK postcodes:

((?i)(([A-Z]{2}[0-9]{1,2})|([A-Z]{1,2}[0-9][A-Z])|([A-Z][0-9]{1,2}))\s([0-9][A-Z]{2})|(BFPO\s\d{1,4})|(GIR\s0AA))

However, it failed to validate correctly for postcodes such as 'G12 4NNT' (the last section is only allowed to be a number followed by 2 letters in this case). I fixed this by using the java.util.regex.Pattern class which correctly uses the above regular expression and passes all of my unit tests.

However, now I'm curious why it didn't work with the Perl5 ones. Is there a fundemental difference with regular expression syntax used by the two APIs?

stema · Accepted Answer

I think the problem is the same than in the question to the above linked answer.

If you use in Java the matches() method:

text.matches("((?i)(([A-Z]{2}[0-9]{1,2})|([A-Z]{1,2}[0-9][A-Z])|([A-Z][0-9]{1,2}))\s([0-9][A-Z]{2})|(BFPO\s\d{1,4})|(GIR\s0AA))");

it matches against the complete string, to have the same behaviour in Perl, you have to anchors around your expression:

^((?i)(([A-Z]{2}[0-9]{1,2})|([A-Z]{1,2}[0-9][A-Z])|([A-Z][0-9]{1,2}))\s([0-9][A-Z]{2})|(BFPO\s\d{1,4})|(GIR\s0AA))$

^ matches the start of the string

$ matches the end of the string

Regular expression works with java.util.regex.Pattern but not com.oroinc.text.regex.Perl5Matcher

Answers (1)

Related Questions