Reputation: 6590
Having read Joel on Encoding like a good boy, I find myself perplexed by the workings of Foundation's JSONDecoder
, neither of whose init
or decode
methods take an encoding value. Looking through the docs, I see the instance variable dataDecodingStrategy, which perhaps this is where the encoding-guessing magic happens...?
Am I missing something here? Shouldn't JSONDecoder
need to know the encoding of the data it receives? I realize that the JSON standard requires this data to be UTF-8 encoded, but can JSONDecoder
be making that assumption? I'm confused.
Upvotes: 2
Views: 1184
Reputation: 539715
RFC 8259 (from 2017) requires that
JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8.
The older RFC 7159 (from 2013) and RFC 7158 (from 2013) only stated that
JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default encoding is UTF-8, and JSON texts that are encoded in UTF-8 are interoperable in the sense that they will be read successfully by the maximum number of implementations; there are many implementations that cannot successfully read texts in other encodings (such as UTF-16 and UTF-32).
And RFC 4627 (from 2006, the oldest one that I could find):
JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.
Since the first two characters of a JSON text will always be ASCII characters, it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.
JSONDecoder
(which uses JSONSerialization
under the hood) is able to decode UTF-8, UTF-16, and UTF-32, both little-endian and big-endian. Example:
let data = "[1, 2, 3]".data(using: .utf16LittleEndian)!
print(data as NSData) // <5b003100 2c002000 32002c00 20003300 5d00>
let a = try! JSONDecoder().decode([Int].self, from: data)
print(a) // [1, 2, 3]
Since a valid JSON text must start with "[", or "{", the encoding can unambiguously be determined from the first bytes of the data.
I did not find this documented though, and one probably should not rely on it. A future implementation of JSONDecoder
might support only the newer standard and require UTF-8.
Upvotes: 5