Reputation: 1986
One of the values in a JSON file I'm parsing is Wroc\u00c5\u0082aw
. How can I turn this string into a unicode object that yields "Wrocław" (which is the correct decoding in this case)?
Upvotes: 7
Views: 18496
Reputation: 193
It looks like whatever process generated that JSON took UTF-8-encoded text and mistook it for Latin-1-encoded text. To fix the error, run the same process in reverse:
>>> u'Wroc\u00c5\u0082aw'.encode('iso-8859-1').decode('utf-8')
u'Wroc\u0142aw'
>>> import unicodedata
>>> unicodedata.name(u'\u0142')
'LATIN SMALL LETTER L WITH STROKE'
Upvotes: 7
Reputation: 26
It looks your JSON hasn't the right encoding because neither \u00c5 nor \u0082aw yields the characters you're expecting in any encoding.
But you'd maybe try to encode this value in UTF8 or UTF16
Upvotes: 1