James Lin
James Lin

Reputation: 26538

Python what's the difference between str(u'a') and u'a'.encode('utf-8')

As title, is there a reason not to use str() to cast unicode string to str??

>>> str(u'a')
'a'
>>> str(u'a').__class__
<type 'str'>
>>> u'a'.encode('utf-8')
'a'
>>> u'a'.encode('utf-8').__class__
<type 'str'>
>>> u'a'.encode().__class__
<type 'str'>

UPDATE: thanks for the answer, also didn't know if I create a string using special character it will automatically convert to utf-8

>>> a = '€'
>>> a.__class__
<type 'str'>
>>> a
'\xe2\x82\xac'

Also is a Unicode object in python 3

Upvotes: 8

Views: 2454

Answers (1)

Mark Byers
Mark Byers

Reputation: 838276

When you write str(u'a') it converts the Unicode string to a bytestring using the default encoding which (unless you've gone to the trouble of changing it) will be ASCII.

The second version explicitly encodes the string as UTF-8.

The difference is more apparent if you try with a string containing non-ASCII characters. The second version will still work:

>>> u'€'.encode('utf-8')
'\xc2\x80'

The first version will give an exception:

>>> str(u'€')

Traceback (most recent call last):
  File "", line 1, in 
    str(u'€')
UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in position 0: ordinal not in range(128)

Upvotes: 19

Related Questions