user296546
user296546

Reputation: 135

cannot output a json encoded dict containing accents (noob inside)

here is a fairly simple example wich is driving me nuts since a couple of days. Considering the following script:

# -*- coding: utf-8 -*
from json import dumps as json_dumps

machaine = u"une personne émérite"
print(machaine)

output = {}
output[1] = machaine
jsonoutput = json_dumps(output)
print(jsonoutput)

The result of this from cli:

une personne émérite
{"1": "une personne \u00e9m\u00e9rite"}

I don't understand why their such a difference between the two strings. i have been trying all sorts of encode, decode etc but i can't seem to be able to find the right way to do it. Does anybody has an idea ?

Thanks in advance. Matthieu

Upvotes: 2

Views: 1807

Answers (2)

DS.
DS.

Reputation: 24110

To clarify Marcelo Cantos's answer: json.dumps() returns a JSON-encoding, which is an ASCII string, starting with the character '{', and containing backslashes, quotes, etc. You have to decode it (e.g. with json.loads() to get back the actual dict with data.

# -*- coding: utf-8 -*
import json

output = {1: u"une personne émérite"}
print output[1]

json_encoded = json.dumps(output)
print "Encoded: %s" % repr(json_encoded)

input = json.loads(json_encoded)
print input['1']

outputs:

une personne émérite
Encoded: '{"1": "une personne \\u00e9m\\u00e9rite"}'
une personne émérite

Upvotes: 2

Marcelo Cantos
Marcelo Cantos

Reputation: 185902

The encoding is correct. Load it back in and print it, and you'll see the correct output:

>>> import json
>>> jsoninput = json.loads(jsonoutput)
>>> print jsoninput
{u'1': u'une personne \xe9m\xe9rite'}
>>> print jsoninput['1']
une personne émérite

Upvotes: 3

Related Questions