bukzor
bukzor

Reputation: 38532

JSON specifies "any UNICODE character"?

Maybe this is just my unfamiliarity with unicode, so please correct me if I'm mistaken.

Looking at http://json.org/, the spec says that a string can include "any UNICODE character", but this confuses me.

So what did they mean there?

Upvotes: 9

Views: 4801

Answers (3)

cobbal
cobbal

Reputation: 70785

From the RFC:

3.  Encoding

   JSON text SHALL be encoded in Unicode.  The default encoding is
   UTF-8.

   Since the first two characters of a JSON text will always be ASCII
   characters [RFC0020], it is possible to determine whether an octet
   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
   at the pattern of nulls in the first four octets.

           00 00 00 xx  UTF-32BE
           00 xx 00 xx  UTF-16BE
           xx 00 00 00  UTF-32LE
           xx 00 xx 00  UTF-16LE
           xx xx xx xx  UTF-8

Upvotes: 18

Matthew Flaschen
Matthew Flaschen

Reputation: 285047

You're correct that everything must translate into bytes, and usually that usually occurs through a UTF (Unicode Transformation Format). The JSON RFC explains in section 3 how to tell what UTF is being used.

Upvotes: 1

Darin Dimitrov
Darin Dimitrov

Reputation: 1039438

JSON is a serialization format which can include UNICODE characters. The byte representation of this unicode string is usually sent over the wire, normally through the HTTP protocol which uses HTTP headers to specify the encoding to the client which is UTF-8.

Upvotes: 3

Related Questions