Gansi
Gansi

Reputation: 11

Perl unicode conversion

I'm using this code:

use Unicode::UTF8 qw[decode_utf8 encode_utf8];
my $d = "opposite Spencer\u2019s Aliganj, Lucknow";
my $string = decode_utf8($d);
my $octets = encode_utf8($d);
print "\nSTRING :: $string";

I want output like

opposite Spencer's Aliganj, Lucknow

what to do ?

Upvotes: 0

Views: 1461

Answers (2)

ikegami
ikegami

Reputation: 385789

You're trying to parse butchered JSON.

You could parse it yourself.

use Encode qw( decode );

my $incomplete_json = "opposite Spencer\u2019s Aliganj, Lucknow";

my $string = $incomplete_json;
$string =~ s{\\u([dD][89aAbB]..)\\u([dD][cCdDeEfF]..)|\\u(....)}
            { $1 ? decode('UTF-16be', pack('H*', $1.$2)) : chr(hex($3)) }eg;

Or you could fix it up then use an existing parser

use JSON::XS qw( decode_json );

my $incomplete_json = "opposite Spencer\u2019s Aliganj, Lucknow";

my $json = $incomplete_json;
$json =~ s/"/\\"/g;
$json = qq{["$json"]};

my $string = decode_json($json)->[0];

Untested. You may have to handle other slashes. Which solution is simpler depends on how you have to handle the other slashes.

Upvotes: 0

Suic
Suic

Reputation: 2501

If you just want unicode #2019 to become you can use one of this ways:

use strict;
use warnings;
use open ':std', ':encoding(utf-8)';
print chr(0x2019);
print "\x{2019}";  # for characters 0x100 and above
print "\N{U+2019}";

\u \U in perl translates to uppercase in perl:

Case translation operators use the Unicode case translation tables when character input is provided. Note that uc(), or \U in interpolated strings, translates to uppercase, while ucfirst, or \u in interpolated strings, translates to titlecase in languages that make the distinction (which is equivalent to uppercase in languages without the distinction).

Upvotes: 1

Related Questions