F.Hand
F.Hand

Reputation: 83

Reading json files with utf-8 characters with python

I have a large json file with utf-8 encoded characters. How can I read this file and convert these characters to more readable version? I have something like this:

{
    "name": "Wroc\u00c5\u0082aw"
}

and i want to have this:

{
    "name": "Wrocław"
}

Upvotes: 0

Views: 787

Answers (1)

tripleee
tripleee

Reputation: 189948

If your JSON data contains mojibake like this, you can convert it to proper Unicode by converting the string to Latin-1, then decoding the result as UTF-8. This reverses whichever process produced the mojibake. (The fact that the strings come from JSON is inconsequential; this works for any mojibake strings of this type.)

>>> s = "Wroc\u00c5\u0082aw"
>>> s.encode('latin-1').decode('utf-8')
'Wrocław'

In the general case, you have to reverse-engineer what produced the mojibake, but this particular case is easy to identify and troubleshoot, because the Latin-1 encoding in particular is obvious and transparent (every byte is encoded exactly as itself).

Upvotes: 2

Related Questions