Reputation: 38532
Maybe this is just my unfamiliarity with unicode, so please correct me if I'm mistaken.
Looking at http://json.org/, the spec says that a string can include "any UNICODE character", but this confuses me.
So what did they mean there?
Upvotes: 9
Views: 4801
Reputation: 70785
From the RFC:
3. Encoding JSON text SHALL be encoded in Unicode. The default encoding is UTF-8. Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets. 00 00 00 xx UTF-32BE 00 xx 00 xx UTF-16BE xx 00 00 00 UTF-32LE xx 00 xx 00 UTF-16LE xx xx xx xx UTF-8
Upvotes: 18
Reputation: 285047
You're correct that everything must translate into bytes, and usually that usually occurs through a UTF (Unicode Transformation Format). The JSON RFC explains in section 3 how to tell what UTF is being used.
Upvotes: 1
Reputation: 1039438
JSON is a serialization format which can include UNICODE characters. The byte representation of this unicode string is usually sent over the wire, normally through the HTTP protocol which uses HTTP headers to specify the encoding to the client which is UTF-8.
Upvotes: 3