Decode unicode string in python

Question

I'd like to decode the following string:

t\u028c\u02c8m\u0251\u0279o\u028a\u032f

It should be the IPA of 'tomorrow' as given in a JSON string from http://rhymebrain.com/talk?function=getWordInfo&word=tomorrow

My understanding is that it should be something like:

x = 't\u028c\u02c8m\u0251\u0279o\u028a\u032f'
print x.decode()

I have tried the solutions from here , here , here, and here (and several other that more or less apply), and several permutations of its parts, but I can't get it to work.

Thank you

Justin O Barber · Accepted Answer

You need a u before your string (in Python 2.x, which you appear to be using) to indicate that this is a unicode string:

>>> x = u't\u028c\u02c8m\u0251\u0279o\u028a\u032f'  # note the u
>>> print x
tʌˈmɑɹoʊ̯

If you have already stored the string in a variable, you can use the following constructor to convert the string into unicode:

>>> s = 't\u028c\u02c8m\u0251\u0279o\u028a\u032f'  # your string has a unicode-escape encoding but is not unicode
>>> x = unicode(s, encoding='unicode-escape')
>>> print x
tʌˈmɑɹoʊ̯
>>> x
u't\u028c\u02c8m\u0251\u0279o\u028a\u032f'  # a unicode string

Decode unicode string in python

Answers (1)

Related Questions