ntl0ve
ntl0ve

Reputation: 1986

Reading JSON: what encoding is "\u00c5\u0082"? How do I get it to a unicode object?

One of the values in a JSON file I'm parsing is Wroc\u00c5\u0082aw. How can I turn this string into a unicode object that yields "Wrocław" (which is the correct decoding in this case)?

Upvotes: 7

Views: 18496

Answers (2)

Seth Gordon
Seth Gordon

Reputation: 193

It looks like whatever process generated that JSON took UTF-8-encoded text and mistook it for Latin-1-encoded text. To fix the error, run the same process in reverse:

>>> u'Wroc\u00c5\u0082aw'.encode('iso-8859-1').decode('utf-8')
u'Wroc\u0142aw'
>>> import unicodedata
>>> unicodedata.name(u'\u0142')
'LATIN SMALL LETTER L WITH STROKE'

Upvotes: 7

Antoine Marliac
Antoine Marliac

Reputation: 26

It looks your JSON hasn't the right encoding because neither \u00c5 nor \u0082aw yields the characters you're expecting in any encoding.

But you'd maybe try to encode this value in UTF8 or UTF16

Upvotes: 1

Related Questions