Reputation: 111
I'm using python 2.7 for example:
a = u'你好'
b = '你好'
I tried following code but failed
print a.encode('UTF-8') == b #return False
How to compare them as equal?
Upvotes: 1
Views: 343
Reputation: 5751
Very likely your Python source file isn't encoded in UTF-8. The variable b
will contain whatever bytes are between those quotes. Those bytes will depend on the encoding. For example
# coding: utf-8
print repr("你好")
prints: '\xe4\xbd\xa0\xe5\xa5\xbd'
Now if we save our source file as GB2312 and update the declaration:
# coding: GB2312
print repr("你好")
prints: '\xc4\xe3\xba\xc3'
In any case, if you have a byte array with text, you also need to know the encoding of those bytes, otherwise you can't reliably interpret them.
If you need UTF-8 bytes regardless of source file encoding, you can write u'你好'.encode('utf-8')
will will always return '\xe4\xbd\xa0\xe5\xa5\xbd'
.
Upvotes: 1
Reputation: 16081
I hope you are using python3
, Both of the variables are string
you don't need to change in to any of it. Simply compare both of them.
>>> a = u'你好'
>>> b = '你好'
>>> type(a)
<class 'str'>
>>> type(b)
<class 'str'>
>>> a == b
True
if you are using python2
your attempt will work.
Upvotes: 1