Reputation: 13412
Based on Perl JSON 2.90 documentation, to encode JSON object in UTF-8 all you need to do is:
$json_text = JSON->new->utf8->encode($perl_scalar)
That is obvious and this what I did. After a while, I got an issue report on GitHub from one of users, which made me really surprised, as it shouldn't be happening!
I was beating for hours to figure out what was happening but the solution happened to be very weird and wrong from my point of view.
What eventually worked for me is this:
$json_text = JSON->new->latin1->encode($perl_scalar)
After that, I tested this code with all different characters, including Russian and Chinese - it just worked?
Can anyone please explain, why encoding is working correctly with latin1
and not with utf8
, when it's actually has to be visa versa?
Upvotes: 1
Views: 1358
Reputation: 385867
Two possible bugs could result in the described outcome.
You were passing strings already encoded using UTF-8 to encode
.
If $string
contains installé
and sprintf '%vX', $string
returns 69.6E.73.74.61.6C.6C.C3 A9
, are suffering from this bug.
If you are suffering from the this bug, properly decode all inputs to your program, and continue using JSON->new->utf8->encode
(aka encode_json
).
You were encoding the output of the JSON command using UTF-8 a second time, possibly via a :utf8
or :encoding
layer on a file handle.
If $string
contains installé
and sprintf '%vX', $string
returns 69.6E.73.74.61.6C.6C.E9
, are suffering from this bug.
If you are suffering from the this bug, either use JSON->new->encode
(aka to_json
) and keep the second layer of encoding, or use JSON->new->utf8->encode
(aka encode_json
) and remove the second layer of encoding.
In neither case is the solution to use JSON->new->latin1->encode
.
Upvotes: 3
Reputation: 30225
What are you doing to output $json_text
? What kind of binmode do you use on that handle? The screenshot looks like it's double-encoded, which suggests the handle has :utf8
or :encoding
enabled (which is incorrect for writing encoded data to). As unintuitively as it may seem, ->latin1
giving a correct result matches that hypothesis (PerlIO assumes any binary string is encoded as latin-1).
Upvotes: 3