FelixD
FelixD

Reputation: 639

Perl regexps not matching string with leading zeros / incorrectly escaped numerals with leading zeros on command line in Perl

I have updated this question, as in the original question the issue I was chasing turned out to be an alltogether different bug (not interesting in this context). But the second order mistake I did in testing is something others may run into and produced an answer with a very interesting insight, so I'll leave this here as a question.

I was trying to track down an issue with regular expressions seemingly not matching due to leading zeros. I found that all of the following regexp didn't match in my command line tests:

"005630" =~ /^0056(10|11|15|20|21|25|30|31)$/
"005630" =~ /0056(10|11|15|20|21|25|30|31)/  
"005630" =~ /56(10|11|15|20|21|25|30|31)/
"005630" =~ /..56(10|11|15|20|21|25|30|31)/
"005630" =~ /..5630/
"005630" =~ /005630/
"005630" =~ /^005630$/
"005630" =~ /5630/
"005630" =~ /(0)*5630/
"005630" =~ /5630/g
"005630" =~ m/5630/g

This did match:

"x005630" =~ /0056(10|11|15|20|21|25|30|31)/

similar for others, i.e. once I added a leading letter, it works.

The test code was (tested with Cygwin Perl v5.10.1 on a Cygwin bash):

perl -e "print ( "005630" =~ /0056(10|11|15|20|21|25|30|31)/)"   # does not display a true value
perl -e "print ( "x005630" =~ /0056(10|11|15|20|21|25|30|31)/)"  # displays a true value

The quoting here is obviously a mistake (can't use unescaped " in a string quoted with "). But I still didn't understand why the second line works despite incorrect quoting.

Note: This could also occur in other situations without regular expressions.

Upvotes: 4

Views: 313

Answers (1)

cjm
cjm

Reputation: 62099

The reason why given the commands

perl -e "print ( "005630" =~ /0056(10|11|15|20|21|25|30|31)/)"
perl -e "print ( "x005630" =~ /0056(10|11|15|20|21|25|30|31)/)"

only the second line prints a match is that Perl supports octal numeric literals. As you figured out, your shell is eating the quotes, so you're actually executing the statements:

print ( 005630 =~ /0056(10|11|15|20|21|25|30|31)/);
print ( x005630 =~ /0056(10|11|15|20|21|25|30|31)/);

Any numeric literal (an unquoted number) that begins with a zero that isn't immediately followed by a decimal point is treated as an octal number.

perl -e "print 005630 . ''"  # prints 2968
perl -e "print x005630 . ''" # prints x005630

(The . '' is needed here to ensure that the bareword is treated as a string. The =~ operator does that in your example.)

So the reason your regex doesn't match is that your string doesn't contain what you think it does.

Upvotes: 10

Related Questions