nfpyfzyf
nfpyfzyf

Reputation: 2981

Perl regexp with utf8 chars

I have these two files: t.pl and t:

$ file t.pl t
t.pl: UTF-8 Unicode text
t:    UTF-8 Unicode text
$ cat t
日本

t.pl has three versions:

case 1

use strict;
use warnings;
use utf8;

$_='日本';
if(/日/){
    print "match!\n";
 }

and perl t.pl outpus match!

case 2

use strict;
use warnings;
use utf8;

while(<DATA>){
    chomp;
    if(/日/){
        print "match!\n";
    }
}
__DATA__
日本

also match!

then case 3

use strict;
use warnings;
use utf8;

while(<>){
    chomp;
    if(/日/){
        print "match!\n";
    }
}

perl t.pl t does not show match!

So what's wrong with case 3?

Upvotes: 1

Views: 235

Answers (1)

choroba
choroba

Reputation: 241868

You have to set the encoding for the input, use utf8 does not do that for you. Just insert

use open IN => ":utf8";

before the loop. See open for details.

Upvotes: 3

Related Questions