Reputation: 1547
I am using an API in Python v2.7 to obtain a string, the content of which is unknown. The content can be in English, German or French. The variable name assigned to the returned string is 'category'. An example of a returned value for the variable category is:-
"temp\\u00eate de poussi\\u00e8res"
I have tried category.decode('utf-8')
to decode the string into, in the above case, French, but unfortunately it still returns the same value, with an additional unicode 'u' at the beginning when I print the result of category.decode('utf-8')
.
u'"temp\\u00eate de poussi\\u00e8res'
I also tried category.encode('utf-8')
but it returns the same value (minus the 'u' that precedes the string:-
'"temp\\u00eate de poussi\\u00e8res"'
Any suggestions?
Upvotes: 1
Views: 1632
Reputation: 177901
It looks like the API uses JSON. You can decode it with the json
module:
>>> import json
>>> json.loads('"temp\\u00eate de poussi\\u00e8res"')
u'temp\xeate de poussi\xe8res'
>>> print(json.loads('"temp\\u00eate de poussi\\u00e8res"'))
tempête de poussières
Upvotes: 1
Reputation: 98436
I think you have literal slashes in your string, not unicode characters.
That is, \u00ea
is the unicode escape encoding for ê
, but \\u00ea
is actually a slash (escaped), two zeros and two letters.
Similar for the quotation marks, your first and last characters are literal double quotes "
.
You can convert those slash plus codepoint into their equivalent characters with:
x = '"temp\\u00eate de poussi\\u00e8res"'
d = x.decode("unicode_escape")
print d
The output is:
"tempête de poussières"
Note that to see the proper international characters you have to use print. If instead you just write d
in the interactive Python shell you get:
u'"temp\xeate de poussi\xe8res"'
where \xea
is equivalent as \u00ea
, that is the escape sequence for ê
.
Removing the quotes, if required, is left as an exercise to the reader ;-).
Upvotes: 2