Reputation: 79556
I'm working on some Perl code that handles possibly malformed UTF8, and have come across an oddity with regex matching. Consider the following code:
#!/usr/bin/perl
use strict;
use warnings;
use utf8;
my $string = "One \x{FFFF_FFFF} three\n";
my $re1 = qr/\x{FFFF_FFFF}/;
my $re2 = qr/.*\x{FFFF_FFFF}/;
my $re3 = qr/.\x{FFFF_FFFF}/;
print "One\n" if $string =~ $re1;
print "Two\n" if $string =~ $re2;
print "Three\n" if $string =~ $re3;
The output is:
One
Three
Why doesn't the second regular expression also match? Is there a work-around?
I'm using Perl 5.14.2.
Upvotes: 2
Views: 127
Reputation: 385645
Because of a bug that's already been fixed in 5.18
$ usr/perlbrew/perls/5.16.3t/bin/perl -wE'
say "One \x{FFFF_FFFF} three\n" =~ /.*\x{FFFF_FFFF}/ ?1:0'
0
$ usr/perlbrew/perls/5.18.2t/bin/perl -wE'
say "One \x{FFFF_FFFF} three\n" =~ /.*\x{FFFF_FFFF}/ ?1:0'
1
Upvotes: 2