Reputation: 101
I want to convert an UTF-8 string to an UTF-16BE string in the hex notation.
As an example, let's say I have the string "C'est-à-dire que ça c'est l'été."
say sprintf("%vX", $string); # 43.27.65.73.74.2D.E0.2D...
It should be converted to
00430027006500730074002d00e0002d...
I'm using
use Encode qw(decode encode);
use feature 'unicode_strings' ;
Up to now, I was not successful using "encode" and "unpack".
What is the right way to go?
Upvotes: 0
Views: 732
Reputation: 386706
First of all, your string isn't encoded using UTF-8 as you claim. "à
" (U+E0) encoded using UTF-8 would be C3 A0
, but you have E0
. I'm guessing you have decoded text aka a string of Unicode Code Points. (That would be a good thing. You normally want to work with decoded text.)
To convert decoded text into UTF-16be, you can use
use Encode qw( encode );
my $s_utf16be = encode("UTF-16be", $s_ucp);
# "\x00\x43\x00\x27\x00\x65\x00\x73\x00\x74\x00\x2d\x00\xe0\x00\x2d..."
But you don't want UTF-16be; you want the hex representation of the UTF-16be encoding of the string.
my $s_utf16be_hex = unpack("H*", $s_utf16be);
# "00430027006500730074002d00e0002d..."
Upvotes: 2
Reputation: 52644
Using sprintf
and ord
:
#!/usr/bin/env perl
use warnings;
use strict;
use utf8;
use feature qw/say/;
my $string = "C'est-à-dire que ça c'est l'été.";
say join("", map { sprintf "%04x", ord $_ } split(//, $string));
outputs
00430027006500730074002d00e0002d00640069007200650020007100750065002000e700610020006300270065007300740020006c002700e9007400e9002e
Upvotes: 1