In the JSON spec, what does "Since the first two characters of a JSON text will always be ASCII characters" mean?

Question

Encoding

JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.

What does it mean "Since the first two characters of a JSON text will always be ASCII characters [RFC0020]"? I've looked at RFC0020 but couldn't find anything about it. JSON could be {" or { " (ie whitespace before the quote.

Oded · Accepted Answer

It means that since JSON will always start with ASCII characters (non-ASCII is only permitted in strings, which cannot be the root object), it is possible to determine from the start of the stream/file what encoding it is in.

UTF-16 and UTF-32 should have a BOM that appears at the start of the stream and by finding out what it is, you can determine the exact encoding. This is possible as one can determine if the first characters are JSON or not.

I assume the spec specifically mentions this as for many other text streams/files, this is not always possible (as most text files can start with any two characters and the two first bytes of the actual file are not known in advance).

In the JSON spec, what does "Since the first two characters of a JSON text will always be ASCII characters" mean?

Answers (2)

Related Questions

In the JSON spec, what does &quot;Since the first two characters of a JSON text will always be ASCII characters&quot; mean?

Answers (2)

Related Questions

In the JSON spec, what does "Since the first two characters of a JSON text will always be ASCII characters" mean?