Davs
Davs

Reputation: 489

Perl serialize UTF8 encoded data with JSON

I am a bit confused after reading the docs of JSON::XS. My question is: how can I encode/decode data, which are already in UTF8? calling encode_json seems to double encode them. I would like to create a JSON from a hash, which contains UTF8 encoded strings as well as decode JSON into a hash while keeping data UTF8 encoded. Is it possible, or do I need to mannually Encode::decode_utf8/encode_utf8 the data myself?

Upvotes: 1

Views: 804

Answers (1)

See perldoc for Json::XS:

utf8 flag disabled

When utf8 is disabled (the default), then encode/decode generate and expect Unicode strings, that is, characters with high ordinal Unicode values (> 255) will be encoded as such characters, and likewise such characters are decoded as-is, no changes to them will be done, except "(re-)interpreting" them as Unicode codepoints or Unicode characters, respectively (to Perl, these are the same thing in strings unless you do funny/weird/dumb stuff).

This is useful when you want to do the encoding yourself (e.g. when you want to have UTF-16 encoded JSON texts) or when some other layer does the encoding for you (for example, when printing to a terminal using a filehandle that transparently encodes to UTF-8 you certainly do NOT want to UTF-8 encode your data first and have Perl encode it another time).

utf8 flag enabled

If the utf8-flag is enabled, encode/decode will encode all characters using the corresponding UTF-8 multi-byte sequence, and will expect your input strings to be encoded as UTF-8, that is, no "character" of the input string must have any value > 255, as UTF-8 does not allow that.

The utf8 flag therefore switches between two modes: disabled means you will get a Unicode string in Perl, enabled means you get an UTF-8 encoded octet/binary string in Perl.

Upvotes: 1

Related Questions