lzy9059
lzy9059

Reputation: 111

How can I compare unicode type with str type in python of Chinese?

I'm using python 2.7 for example:

a = u'你好'
b = '你好'

I tried following code but failed

print a.encode('UTF-8') == b #return False

How to compare them as equal?

Upvotes: 1

Views: 343

Answers (2)

roeland
roeland

Reputation: 5751

Very likely your Python source file isn't encoded in UTF-8. The variable b will contain whatever bytes are between those quotes. Those bytes will depend on the encoding. For example

# coding: utf-8
print repr("你好")

prints: '\xe4\xbd\xa0\xe5\xa5\xbd'

Now if we save our source file as GB2312 and update the declaration:

# coding: GB2312
print repr("你好")

prints: '\xc4\xe3\xba\xc3'

In any case, if you have a byte array with text, you also need to know the encoding of those bytes, otherwise you can't reliably interpret them.

If you need UTF-8 bytes regardless of source file encoding, you can write u'你好'.encode('utf-8') will will always return '\xe4\xbd\xa0\xe5\xa5\xbd'.

Upvotes: 1

Rahul K P
Rahul K P

Reputation: 16081

I hope you are using python3, Both of the variables are string you don't need to change in to any of it. Simply compare both of them.

>>> a = u'你好'
>>> b = '你好'
>>> type(a)
<class 'str'>
>>> type(b)
<class 'str'>
>>> a == b
True

if you are using python2 your attempt will work.

Upvotes: 1

Related Questions