Влад Кныш
Влад Кныш

Reputation: 41

How to encode Cyrillic characters in JSON

I want to read a JSON file containing Cyrillic symbols.

The Cyrillic symbols are represented like \u123.

Python converts them to '\\u123' instead of the Cyrillic symbol.

For example, the string "\u0420\u0435\u0433\u0438\u043e\u043d" should become "Регион", but becomes "\\u0420\\u0435\\u0433\\u0438\\u043e\\u043d".

encode() just makes string look like u"..." or adds a new \.

How do I convert "\u0420\u0435\u0433\u0438\u043e\u043d" to "Регион"?

Upvotes: 4

Views: 13953

Answers (2)

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798626

If you want json to output a string that has non-ASCII characters in it then you need to pass ensure_ascii=False and then encode manually afterward.

Upvotes: 7

Mark Tolonen
Mark Tolonen

Reputation: 177640

Just use the json module.

import json

s = "\u0420\u0435\u0433\u0438\u043e\u043d"

# Generate a json file.
with open('test.json','w',encoding='ascii') as f:
    json.dump(s,f)

# Reading it directly
with open('test.json') as f:
    print(f.read())

# Reading with the json module
with open('test.json',encoding='ascii') as f:
    data = json.load(f)
print(data)

Output:

"\u0420\u0435\u0433\u0438\u043e\u043d"
Регион

Upvotes: 0

Related Questions