Claude Frantz
Claude Frantz

Reputation: 101

UTF-8 to UTF-16 in perl

I want to convert an UTF-8 string to an UTF-16BE string in the hex notation.

As an example, let's say I have the string "C'est-à-dire que ça c'est l'été."

say sprintf("%vX", $string);  # 43.27.65.73.74.2D.E0.2D...

It should be converted to

00430027006500730074002d00e0002d...

I'm using

use Encode qw(decode encode); 
use feature 'unicode_strings' ;

Up to now, I was not successful using "encode" and "unpack".

What is the right way to go?

Upvotes: 0

Views: 732

Answers (2)

ikegami
ikegami

Reputation: 386706

First of all, your string isn't encoded using UTF-8 as you claim. "à" (U+E0) encoded using UTF-8 would be C3 A0, but you have E0. I'm guessing you have decoded text aka a string of Unicode Code Points. (That would be a good thing. You normally want to work with decoded text.)

To convert decoded text into UTF-16be, you can use

use Encode qw( encode );
my $s_utf16be = encode("UTF-16be", $s_ucp);
# "\x00\x43\x00\x27\x00\x65\x00\x73\x00\x74\x00\x2d\x00\xe0\x00\x2d..."

But you don't want UTF-16be; you want the hex representation of the UTF-16be encoding of the string.

my $s_utf16be_hex = unpack("H*", $s_utf16be);
# "00430027006500730074002d00e0002d..."

Upvotes: 2

Shawn
Shawn

Reputation: 52644

Using sprintf and ord:

#!/usr/bin/env perl
use warnings;
use strict;
use utf8;
use feature qw/say/;

my $string = "C'est-à-dire que ça c'est l'été.";

say join("", map { sprintf "%04x", ord $_ } split(//, $string));

outputs

00430027006500730074002d00e0002d00640069007200650020007100750065002000e700610020006300270065007300740020006c002700e9007400e9002e

Upvotes: 1

Related Questions