Jort
Jort

Reputation: 1411

UnicodeDecodeError when using json.dumps

I'm trying to return a json object with special characters. The line that crashes is:

return json.dumps([x.toDict() for x in searches], ensure_ascii=False)

The toDict function:

def toDict(self):
  """Expect to dico. Needed before serialization in JSON"""
  out = {}
  if self.wkid is not None:
    out['wkid'] = self.wkid
  if self.wkt is not None:
    out['wkt'] = self.wkt
  return(out)

When I print x in for x in searches:

for x in searches:
  print x.toDict()

{'crs': {'wkid': 4326, 'wkt': 'WGS84'}, 'candidates': [{'score': 200, 'type': 'ADR', 'location': {'y': 50.2485465358886, 'x': 4.38243469412172, 'crs': {'wkid': 4326, 'wkt': 'WGS84'}}, 'address': {'city': 'Fontenelle', 'munkey': '0585'}}, {'score': 200, 'type': 'ADR', 'location': {'y': 50.4123146893214, 'x': 4.32436581556278, 'crs': {'wkid': 4326, 'wkt': 'WGS84'}}, 'address': {'city': "Fontaine-l'\\xe3\\x89v\\xe3\\xaaque", 'munkey': '0324'}}, {'score': 200, 'type': 'ADR', 'location': {'y': 50.3217667573625, 'x': 4.21386030471998, 'crs': {'wkid': 4326, 'wkt': 'WGS84'}}, 'address': {'city': 'Fontaine-valmont', 'munkey': '0362'}}, {'score': 200, 'type': 'ADR', 'location': {'y': 49.7151404477129, 'x': 5.23438436377951, 'crs': {'wkid': 4326, 'wkt': 'WGS84'}}, 'address': {'city': 'Fontenoille', 'munkey': '0541'}}], 'id': u'1', 'address': {'city': u'Fontaine-lev', 'street': u'Avenue des Chones', 'zone': u'1301', 'house': u'19'}}

This works fine. However, when I try:

for x in searches:
  print json.dumps(x.toDict(), ensure_ascii=False)

The error that I'm getting is:

UnicodeDecodeError('ascii', '"Fontaine-l\\'\\xe3\\x89v\\xe3\\xaaque"', 12, 13, 'ordinal not in range(128)')
'ascii' codec can't decode byte 0xe3 in position 12: ordinal not in range(128).

Strange, considering that I pass ensure_ascii=False to specify that the text should not be decoded..

What could be wrong that it is still trying to decode the text?

Upvotes: 0

Views: 142

Answers (1)

Shanavas M
Shanavas M

Reputation: 1629

ensure_ascii=False doesn't mean it will not decode the unicode literals.

If ensure_ascii is false, some chunks written to fp may be unicode instances. This usually happens because the input contains unicode strings or the encoding parameter is used. Unless fp.write() explicitly understands unicode (as in codecs.getwriter()) this is likely to cause an error.

from python doc

Upvotes: 2

Related Questions