Reputation: 9522
We have the following formatted string:
'{}: {}.'.format(message, object)
Which raises:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
The object's string is non-ascii, but the method is overriden so that it returns a unicode string:
def __str__(self):
return unicode(self.name)
Why then is a UnicodeEncodeError
being raised? What can I do to fix it?
I have tried turning the string into a unicode one:
u'{}: {}.'.format(message, object)
But that messes up the object's string. It returns \xf1\xf1\xf1\xf1
instead of ññññ
.
Upvotes: 0
Views: 51
Reputation: 148910
In Python 2, normal strings are byte strings. And __str__
should never return an unicode string: you are breaking the str
contract. If you need unicode conversion for your object use the __unicode__
special function:
def __unicode__(self):
return unicode(self.name)
or even better return self.name.decode(encoding)
where encoding is the encoding of self.name
.
And never mix unicode strings and byte strings without explicit encoding. So the correct way is:
'{}: {}.'.format(message, unicode(object).encode(encoding))
Here again, encoding represents what you want for the external representation. Common encodings are Latin1
or cp1252
on Windows, and often utf-8
on Linux
Upvotes: 3
Reputation: 536
I recommend function decode
and encode
, as follow:
class A(object):
def __str__(self):
return "速度快".decode("utf-8", "ignore")
obj = A()
print u"{}".format(obj)
add u
Upvotes: 0