Reputation: 8778
These two commands output different results:
In [102]: json.dumps({'Café': 1}, ensure_ascii=False, encoding='utf-8')
Out[102]: '{"Caf\xc3\xa9": 1}'
In [103]: json.dumps({'Café': 1}, ensure_ascii=False, encoding='utf8')
Out[103]: u'{"Caf\xe9": 1}'
What's the difference between utf-8
and utf8
?
Upvotes: 3
Views: 183
Reputation: 27724
Notice that the second iteration returns a Unicode object.
It seems strange but the documentation calls this out:
If ensure_ascii is False, the result may contain non-ASCII characters and the return value may be a unicode instance.
It would appear that only "UTF-8" works with ensure_ascii=False
AND if the input is a UTF-8 encoded string (Not Unicode). With a Unicode input:
>>> json.dumps({u'Caf€': 1}, ensure_ascii=False, encoding='utf-8')
u'{"Caf\u20ac": 1}'
With ensure_ascii=False
, all other valid encodings return a Unicode instance.
If you set ensure_ascii=True
, then the encoding is consistent and works with other encoding, such as "windows-1252" (The input needs to be a Unicode)
I guess the rationale is that JSON should be ASCII and all encodings should be escaped, even when it's UTF-8.
To avoid any surprises follow these rules:
For proper spec. ASCII JSON:
Call:
>>> json.dumps({u'Caf€': 1}, ensure_ascii=True)
'{"Caf\\u20ac": 1}'
UTF-8 Encoded JSON:
Call:
>>> json.dumps({u'Caf€': 1}, ensure_ascii=False).encode("utf-8")
'{"Caf\xe2\x82\xac": 1}'
Upvotes: 1