Reputation: 83
I have a large json file with utf-8 encoded characters. How can I read this file and convert these characters to more readable version? I have something like this:
{
"name": "Wroc\u00c5\u0082aw"
}
and i want to have this:
{
"name": "Wrocław"
}
Upvotes: 0
Views: 787
Reputation: 189948
If your JSON data contains mojibake like this, you can convert it to proper Unicode by converting the string to Latin-1, then decoding the result as UTF-8. This reverses whichever process produced the mojibake. (The fact that the strings come from JSON is inconsequential; this works for any mojibake strings of this type.)
>>> s = "Wroc\u00c5\u0082aw"
>>> s.encode('latin-1').decode('utf-8')
'Wrocław'
In the general case, you have to reverse-engineer what produced the mojibake, but this particular case is easy to identify and troubleshoot, because the Latin-1 encoding in particular is obvious and transparent (every byte is encoded exactly as itself).
Upvotes: 2