Reputation: 489
I am a bit confused after reading the docs of JSON::XS. My question is: how can I encode/decode data, which are already in UTF8? calling encode_json seems to double encode them. I would like to create a JSON from a hash, which contains UTF8 encoded strings as well as decode JSON into a hash while keeping data UTF8 encoded. Is it possible, or do I need to mannually Encode::decode_utf8/encode_utf8 the data myself?
Upvotes: 1
Views: 804
Reputation: 19216
See perldoc for Json::XS:
utf8 flag disabled
When utf8 is disabled (the default), then encode/decode generate and expect Unicode strings, that is, characters with high ordinal Unicode values (> 255) will be encoded as such characters, and likewise such characters are decoded as-is, no changes to them will be done, except "(re-)interpreting" them as Unicode codepoints or Unicode characters, respectively (to Perl, these are the same thing in strings unless you do funny/weird/dumb stuff).
This is useful when you want to do the encoding yourself (e.g. when you want to have UTF-16 encoded JSON texts) or when some other layer does the encoding for you (for example, when printing to a terminal using a filehandle that transparently encodes to UTF-8 you certainly do NOT want to UTF-8 encode your data first and have Perl encode it another time).
utf8 flag enabled
If the utf8-flag is enabled, encode/decode will encode all characters using the corresponding UTF-8 multi-byte sequence, and will expect your input strings to be encoded as UTF-8, that is, no "character" of the input string must have any value > 255, as UTF-8 does not allow that.
The utf8 flag therefore switches between two modes: disabled means you will get a Unicode string in Perl, enabled means you get an UTF-8 encoded octet/binary string in Perl.
Upvotes: 1