John Deighan
John Deighan

Reputation: 4569

How to use unicode in perl regexp

Is it possible to get the correct output from this script?

use strict;
use warnings;
my $str = "wo3";
$str =~ s/wo3/wǒ/g;
print("'$str'\n");

I expect to see 'wǒ', not 'wo'

Upvotes: 0

Views: 331

Answers (2)

daxim
daxim

Reputation: 39158

Use modules to save yourself tedious work.

› perl -mCEDict::Pinyin -E'
    say CEDict::Pinyin->new("wo3")->diacritic
'
wǒ

› perl -Mutf8 -mLingua::Han::PinYin -E'
    say Lingua::Han::PinYin
        ->new(unicode => 1)
        ->han2pinyin("我")
'
wǒ

CEDict::Pinyin, Lingua::Han::PinYin

Upvotes: 3

Shawn
Shawn

Reputation: 52374

If you have UTF-8 text directly in a perl script, you need to tell perl that with the utf8 pragma:

use utf8;
use open qw/:std :encoding(UTF-8)/; # Tell perl that STDOUT etc. should be UTF-8 encoded
use strict;
use warnings;
my $str = "wo3";
$str =~ s/wo3/wǒ/g;
print("'$str'\n");

Running it:

$ perl foo.pl
'wǒ'

Upvotes: 3

Related Questions