Kulawy Krul
Kulawy Krul

Reputation: 223

Comparing string and unicode in Python 2.7.5

I wonder why when I make:

a = [u'k',u'ę',u'ą']

and then type:

'k' in a

I get True, while:

'ę' in a

will give me False?

It really gives me headache and it seems someone made this on purpose to make people mad...

Upvotes: 10

Views: 18324

Answers (4)

aIKid
aIKid

Reputation: 28232

And why is this?

In Python 2.x, you can't compare unicode to string directly for non-ascii characters. This will raise a warning:

Warning (from warnings module):
  File "__main__", line 1
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

However, in Python 3.x this doesn't appear, as all strings are unicode objects.

Solution?

You can either make the string unicode:

>>> u'ç' in a
True

Now, you're comparing both unicode objects, not unicode to string.

Or convert both to an encoding, for example utf-8 before comparing:

>>> c = u"ç"
>>> u'ç'.encode('utf-8') == c.encode('utf-8')
True

Also, to use non-ascii characters in your program, you'll have to specify the encoding, at the top of the file:

# -*- coding: utf-8 -*-

#the whole program

Hope this helps!

Upvotes: 15

dawg
dawg

Reputation: 103694

Make sure that you specify the source code encoding and use u in front of unicode literals.

This works both on Python 3 and Python 2:

#!/usr/bin/python
# -*- coding: utf-8 -*-

a = [u'k',u'ę',u'ą']

print(u'ę' in a)
# True

Upvotes: 0

Ethan Furman
Ethan Furman

Reputation: 69021

u'ę' is a unicode object, while 'ę' is a str object in your current locale. Sometimes, depending on locale, they will be the same, and sometimes they will not.

One of the nice things about Python 3 is that all text is unicode, so this particular problem goes away.

Upvotes: 1

jordanm
jordanm

Reputation: 34914

You need to explicitly make the string unicode. The following shows an example, and the warning given when you do not specify it as unicode:

>>> a = [u'k',u'ę',u'ą']
>>> 'k' in a
True
>>> 'ę' in a
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
>>> u'ę' in a
True

Upvotes: 4

Related Questions