Reputation: 161
Good day! I'm having trouble with decoding text to unicode. I need to convert str which is equal to
'\u4038' # or something like that
in ASCII and I need to convert this string to ONE unicode symbol. Can you please explain< how to do that? The
len(unicode('\u4038'))
prints 6, so this is not a solution:(
If it's needed, the resulting symbol is cyrillic at the most cases.
Upvotes: 1
Views: 1074
Reputation: 368904
If you mean you have a string '\\u4038'
, you can use unicode-escape
encoding:
>>> s = b'\\u4038' # == br'\u4038'
>>> print(s)
\u4038
>>> len(s)
6
>>> print(s.decode('unicode-escape'))
䀸
>>> len(s.decode('unicode-escape'))
1
Upvotes: 3
Reputation: 500167
There's probably a better way, but here is one:
In [27]: s = r'\u4038'
In [28]: len(ast.literal_eval('u"' + s + '"'))
Out[28]: 1
Upvotes: 2