Reputation: 639
I have updated this question, as in the original question the issue I was chasing turned out to be an alltogether different bug (not interesting in this context). But the second order mistake I did in testing is something others may run into and produced an answer with a very interesting insight, so I'll leave this here as a question.
I was trying to track down an issue with regular expressions seemingly not matching due to leading zeros. I found that all of the following regexp didn't match in my command line tests:
"005630" =~ /^0056(10|11|15|20|21|25|30|31)$/
"005630" =~ /0056(10|11|15|20|21|25|30|31)/
"005630" =~ /56(10|11|15|20|21|25|30|31)/
"005630" =~ /..56(10|11|15|20|21|25|30|31)/
"005630" =~ /..5630/
"005630" =~ /005630/
"005630" =~ /^005630$/
"005630" =~ /5630/
"005630" =~ /(0)*5630/
"005630" =~ /5630/g
"005630" =~ m/5630/g
This did match:
"x005630" =~ /0056(10|11|15|20|21|25|30|31)/
similar for others, i.e. once I added a leading letter, it works.
The test code was (tested with Cygwin Perl v5.10.1 on a Cygwin bash):
perl -e "print ( "005630" =~ /0056(10|11|15|20|21|25|30|31)/)" # does not display a true value
perl -e "print ( "x005630" =~ /0056(10|11|15|20|21|25|30|31)/)" # displays a true value
The quoting here is obviously a mistake (can't use unescaped "
in a string quoted with "
). But I still didn't understand why the second line works despite incorrect quoting.
Note: This could also occur in other situations without regular expressions.
Upvotes: 4
Views: 313
Reputation: 62099
The reason why given the commands
perl -e "print ( "005630" =~ /0056(10|11|15|20|21|25|30|31)/)"
perl -e "print ( "x005630" =~ /0056(10|11|15|20|21|25|30|31)/)"
only the second line prints a match is that Perl supports octal numeric literals. As you figured out, your shell is eating the quotes, so you're actually executing the statements:
print ( 005630 =~ /0056(10|11|15|20|21|25|30|31)/);
print ( x005630 =~ /0056(10|11|15|20|21|25|30|31)/);
Any numeric literal (an unquoted number) that begins with a zero that isn't immediately followed by a decimal point is treated as an octal number.
perl -e "print 005630 . ''" # prints 2968
perl -e "print x005630 . ''" # prints x005630
(The . ''
is needed here to ensure that the bareword is treated as a string. The =~
operator does that in your example.)
So the reason your regex doesn't match is that your string doesn't contain what you think it does.
Upvotes: 10