Reputation: 223
I wonder why when I make:
a = [u'k',u'ę',u'ą']
and then type:
'k' in a
I get True
, while:
'ę' in a
will give me False
?
It really gives me headache and it seems someone made this on purpose to make people mad...
Upvotes: 10
Views: 18324
Reputation: 28232
And why is this?
In Python 2.x, you can't compare unicode to string directly for non-ascii characters. This will raise a warning:
Warning (from warnings module):
File "__main__", line 1
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
However, in Python 3.x this doesn't appear, as all strings are unicode objects.
Solution?
You can either make the string unicode:
>>> u'ç' in a
True
Now, you're comparing both unicode objects, not unicode to string.
Or convert both to an encoding, for example utf-8 before comparing:
>>> c = u"ç"
>>> u'ç'.encode('utf-8') == c.encode('utf-8')
True
Also, to use non-ascii characters in your program, you'll have to specify the encoding, at the top of the file:
# -*- coding: utf-8 -*-
#the whole program
Hope this helps!
Upvotes: 15
Reputation: 103694
Make sure that you specify the source code encoding and use u
in front of unicode literals.
This works both on Python 3 and Python 2:
#!/usr/bin/python
# -*- coding: utf-8 -*-
a = [u'k',u'ę',u'ą']
print(u'ę' in a)
# True
Upvotes: 0
Reputation: 69021
u'ę'
is a unicode
object, while 'ę'
is a str
object in your current locale. Sometimes, depending on locale, they will be the same, and sometimes they will not.
One of the nice things about Python 3 is that all text is unicode, so this particular problem goes away.
Upvotes: 1
Reputation: 34914
You need to explicitly make the string unicode. The following shows an example, and the warning given when you do not specify it as unicode:
>>> a = [u'k',u'ę',u'ą']
>>> 'k' in a
True
>>> 'ę' in a
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
>>> u'ę' in a
True
Upvotes: 4