Håkon Hægland
Håkon Hægland

Reputation: 40778

Create unicode character with pack

I am trying to understand how Perl handles unicode.

use feature qw(say);
use strict;
use warnings;

use Encode qw(encode);

say unpack "H*", pack("U", 0xff);
say unpack "H*", encode( 'UTF-8', chr 0xff );

Output:

ff
c3bf

Why do I get ff and not c3bf when using pack ?

Upvotes: 2

Views: 226

Answers (2)

Leon Timmermans
Leon Timmermans

Reputation: 30235

Why do I get ff and not c3bf when using pack ?

This is because pack creates a character string, not a byte string.

> perl -MDevel::Peek -e 'Dump(pack("U", 0xff));'
SV = PV(0x13a6d18) at 0x13d2ce8
  REFCNT = 1
  FLAGS = (PADTMP,POK,READONLY,pPOK,UTF8)
  PV = 0xa6d298 "\303\277"\0 [UTF8 "\x{ff}"]
  CUR = 2
  LEN = 32

Hence unpack("H*") doesn't look at the byte-value of that string, but the (truncated) character value of it. If you'd do:

say unpack "H*", encode("UTF-8", pack("U", 0xff));

Then you'd get the expected result.

See also this thread.

Upvotes: 2

ikegami
ikegami

Reputation: 386501

pack('U', 0xFF)

is just a weird way of doing

chr(0xFF)

so

"\xFF"                             returns chars   FF
chr(0xFF)                          returns chars   FF
pack('U', 0xFF)                    returns chars   FF

"\xC3\xBF"                         returns chars   C3 BF
encode('UTF-8', chr(0xFF))         returns chars   C3 BF
encode('UTF-8', pack('U', 0xFF))   returns chars   C3 BF

so

say unpack "H*", "\xFF";                             outputs   ff
say unpack "H*", chr(0xFF);                          outputs   ff
say unpack "H*", pack('U', 0xFF);                    outputs   ff

say unpack "H*", "\xC3\xBF";                         outputs   c3bf
say unpack "H*", encode('UTF-8', pack('U', 0xFF));   outputs   c3bf
say unpack "H*", encode('UTF-8', chr(0xFF));         outputs   c3bf

Upvotes: 2

Related Questions