Reputation: 26538
As title, is there a reason not to use str() to cast unicode string to str??
>>> str(u'a')
'a'
>>> str(u'a').__class__
<type 'str'>
>>> u'a'.encode('utf-8')
'a'
>>> u'a'.encode('utf-8').__class__
<type 'str'>
>>> u'a'.encode().__class__
<type 'str'>
UPDATE: thanks for the answer, also didn't know if I create a string using special character it will automatically convert to utf-8
>>> a = '€'
>>> a.__class__
<type 'str'>
>>> a
'\xe2\x82\xac'
Also is a Unicode object in python 3
Upvotes: 8
Views: 2454
Reputation: 838276
When you write str(u'a')
it converts the Unicode string to a bytestring using the default encoding which (unless you've gone to the trouble of changing it) will be ASCII.
The second version explicitly encodes the string as UTF-8.
The difference is more apparent if you try with a string containing non-ASCII characters. The second version will still work:
>>> u'€'.encode('utf-8') '\xc2\x80'
The first version will give an exception:
>>> str(u'€') Traceback (most recent call last): File "", line 1, in str(u'€') UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in position 0: ordinal not in range(128)
Upvotes: 19