Reputation: 44381
(This question is related to this one)
Take a look at the following session:
Python 2.7.3 (default, Jan 2 2013, 16:53:07)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import simplejson as json
>>>
>>> my_json = '''[
... {
... "id" : "normal",
... "txt" : "This is a normal entry"
... },
... {
... "id" : "αβγδ",
... "txt" : "This is a unicode entry"
... }
... ]'''
>>>
>>> cache = json.loads(my_json, encoding='utf-8')
>>>
>>> cache
[{'txt': 'This is a normal entry', 'id': 'normal'}, {'txt': 'This is a unicode entry', 'id': u'\u03b1\u03b2\u03b3\u03b4'}]
Why is the json decoder producing sometimes unicode, and sometimes plain strings? Isn't it supposed to produce always unicode?
Upvotes: 2
Views: 193
Reputation: 10177
It seems to be an optimization in simplejson, from simplejson docs:
If s is a str then decoded JSON strings that contain only ASCII characters may be parsed as str for performance and memory reasons. If your code expects only unicode the appropriate solution is decode s to unicode prior to calling decode.
Note: Any any characters included in ASCII are encoded the same in both UTF-8 and ASCII. So ASCII is a subset of UTF-8.
Upvotes: 4