Reputation: 2392
I'm wondering what happens internally when you call str() on a unicode string.
# coding: utf-8
s2 = str(u'hello')
Is s2 just the unicode byte representation of the str() arg?
Upvotes: 0
Views: 327
Reputation: 129109
It will try to encode it with your default encoding. On my system, that's ASCII, and if there's any non-ASCII characters, it will fail:
>>> str(u'あ')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u3042' in position 0: ordinal not in range(128)
Note that this is the same error you'd get if you called encode('ascii')
on it:
>>> u'あ'.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u3042' in position 0: ordinal not in range(128)
As you might imagine, str
working on some arguments and failing on others makes it easy to write code that on first glance seems to work, but stops working once you throw some international characters in there. Python 3 avoids this by making the problem blatantly obvious: you can't convert Unicode to a byte string without an explicit encoding:
>>> bytes(u'あ')
TypeError: string argument without an encoding
Upvotes: 5