Kirill
Kirill

Reputation: 1472

How to convert a character string to hexadecimal in Perl (16 bit per character)

I read the post How to convert a hexadecimal number to a char string in Perl to convert a hexadecimal number to a character string.

How can I do the reverse operation? I need convert a character string to hexadecimal in Perl. For example, I have a string, "hello world!" (should be "Hello, World!"), and I must get:

00680065006C006C006F00200077006F0072006C00640021

Upvotes: 8

Views: 13023

Answers (3)

ikegami
ikegami

Reputation: 385556

You appear to want

use Encode qw( encode );

my $text = 'hello world!';
my $hex = uc unpack 'H*', encode 'UTF-16be', $text;

An explanation follows.


The exiting answers provide the hexadecimal representation of the Unicode Code Points.

That format doesn't permit the input to include any characters above 0xFFFF. If it were to permit this, there wouldn't be any way to know if

20000200002000020000

means

2000 0200 0020 0002 0000

or

20000 20000 20000 20000

If that's fine because you'll never have characters above 0xFFFF, then I recommend the following:

my $text = 'hello world!';
my $hex = uc unpack 'H*', pack 'n*', unpack 'W*', $text;

It should be much faster than the existing solutions, and it handles characters above 0xFFFF better than the existing solutions (since it still provides only four hexadecimal digits for characters above 0xFFFF).


If, however, you want to handle all Unicode Code Points, the above solution and the solution provided by the earlier answers aren't adequate.

With that in mind, I suspect you actually want the hexadecimal representation of the UTF-16be encoding of the Unicode Code Points. At worse, having a character above 0xFFFF will still produce useful and lossless output.

Code Point    Perl string lit  JSON string lit  Hex of UCP  Hex of UTF-16be
------------  ---------------  ---------------  ----------  ---------------
h  (U+0068)   "\x{68}          "\u0068"         0068        0068
é  (U+00E9)   "\x{E9}          "\u00E9"         00E9        00E9
ጀ  (U+1300)   "\x{1300}        "\u1300"         1300        1300
𠀀  (U+20000)  "\x{20000}       "\uD840\uDC00"   20000       D840DC00

If that's the case, you want

use Encode qw( encode );

my $text = 'hello world!';
my $hex = uc unpack 'H*', encode 'UTF-16be', $text;

Upvotes: 8

simbabque
simbabque

Reputation: 54323

One algorithm you can use to do this is:

A possible implementation could be

print map { sprintf '%04X', ord } split //, 'hello world!';

The output of this program is

00680065006C006C006F00200077006F0072006C00640021

That said, there is probably a pack implementation that I am not aware of.

Upvotes: 7

Dave Cross
Dave Cross

Reputation: 69224

Here's another approach. Do it all in one go with a regex.

my $string = 'hello world!';
$string =~ s/(.)/sprintf '%04x', ord $1/seg;

Upvotes: 12

Related Questions