Reputation: 241
I am trying to convert an plain string into the special character to work it in my logic in python 2.
word = 'Tb\u03b1'
word = unicode('Tb\u03b1')
if word.encode('utf-8') == u'Tb\u03b1'.encode('utf-8'):
print 'They are equals'
print word.encode('utf-8')
print type(word.encode('utf-8'))
print u'Tb\u03b1'.encode('utf-8')
print type(u'Tb\u03b1'.encode('utf-8'))
I am getting this response
Tb\u03b1
<type 'str'>
Tbα
<type 'str'>
My question is... As I apply the unicode
method to the word, I am not supposed to have the same response in line 1 and 3? I would like to get the line 3 because I need to do some logic based on that special character
Upvotes: 1
Views: 1090
Reputation: 50190
The problem is that \u
has no special meaning in a non-unicode literal, so it remains as \u
in your string. To interpret the \u
escapes and produce the corresponding Unicode, use the encoding "unicode_escape"
:
>>> as_str = "\u03b1"
>>> as_unicode = as_str.decode(encoding="unicode_escape")
>>> print as_unicode
α
But you'd be better off if you could figure out a way to avoid being in this situation. Even better, switch to Python 3 where these kinds of things make a lot more sense.
Upvotes: 2