Jonathan Hall
Jonathan Hall

Reputation: 79556

Oddity with UTF8 in perl regular expressions

I'm working on some Perl code that handles possibly malformed UTF8, and have come across an oddity with regex matching. Consider the following code:

#!/usr/bin/perl
use strict;
use warnings;
use utf8;

my $string = "One \x{FFFF_FFFF} three\n";

my $re1 = qr/\x{FFFF_FFFF}/;
my $re2 = qr/.*\x{FFFF_FFFF}/;
my $re3 = qr/.\x{FFFF_FFFF}/;

print "One\n" if $string =~ $re1;
print "Two\n" if $string =~ $re2;
print "Three\n" if $string =~ $re3;

The output is:

One
Three

Why doesn't the second regular expression also match? Is there a work-around?

I'm using Perl 5.14.2.

Upvotes: 2

Views: 127

Answers (1)

ikegami
ikegami

Reputation: 385645

Because of a bug that's already been fixed in 5.18

$ usr/perlbrew/perls/5.16.3t/bin/perl -wE'
   say "One \x{FFFF_FFFF} three\n" =~ /.*\x{FFFF_FFFF}/ ?1:0'
0

$ usr/perlbrew/perls/5.18.2t/bin/perl -wE'
   say "One \x{FFFF_FFFF} three\n" =~ /.*\x{FFFF_FFFF}/ ?1:0'
1

Upvotes: 2

Related Questions