user1757703
user1757703

Reputation: 3015

Python: Cyrillic handling

I got this data returned b'\\u041a\\u0435\\u0439\\u0442\\u043b\\u0438\\u043d\\u043f\\u0440\\u043e from an API. This data is in Russian which I know for sure. I am guessing these values are unicode representation of the cyrillic letters?

The data returned was a byte array.

How can I convert that into readable cyrillic string? Pretty much I need a way to convert that kind into readable human text.

EDIT: Yes this is JSON data. Forgot to mention, sorry.

Upvotes: 4

Views: 4805

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121844

Chances are you have JSON data; JSON uses \uhhhh escape sequences to represent Unicode codepoints. Use the json.loads() function on unicode (decoded) data to produce a Python string:

import json

string = json.loads(data.decode('utf8'))

UTF-8 is the default JSON encoding; check your response headers (if you are using a HTTP-based API) to see if a different encoding was used.

Demo:

>>> import json
>>> json.loads(b'"\\u041a\\u0435\\u0439\\u0442\\u043b\\u0438\\u043d\\u043f\\u0440\\u043e"'.decode('utf8'))
'Кейтлинпро'

Upvotes: 5

Related Questions