Reputation: 41
I want to read a JSON file containing Cyrillic symbols.
The Cyrillic symbols are represented like \u123
.
Python converts them to '\\u123'
instead of the Cyrillic symbol.
For example, the string "\u0420\u0435\u0433\u0438\u043e\u043d"
should become "Регион"
, but becomes "\\u0420\\u0435\\u0433\\u0438\\u043e\\u043d"
.
encode()
just makes string look like u"..."
or adds a new \
.
How do I convert "\u0420\u0435\u0433\u0438\u043e\u043d"
to "Регион"
?
Upvotes: 4
Views: 13953
Reputation: 798626
If you want json
to output a string that has non-ASCII characters in it then you need to pass ensure_ascii=False
and then encode manually afterward.
Upvotes: 7
Reputation: 177640
Just use the json
module.
import json
s = "\u0420\u0435\u0433\u0438\u043e\u043d"
# Generate a json file.
with open('test.json','w',encoding='ascii') as f:
json.dump(s,f)
# Reading it directly
with open('test.json') as f:
print(f.read())
# Reading with the json module
with open('test.json',encoding='ascii') as f:
data = json.load(f)
print(data)
Output:
"\u0420\u0435\u0433\u0438\u043e\u043d"
Регион
Upvotes: 0