dabadaba
dabadaba

Reputation: 9522

UnicodeEncodeError when __str__ already returns unicode

We have the following formatted string:

'{}: {}.'.format(message, object)

Which raises:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

The object's string is non-ascii, but the method is overriden so that it returns a unicode string:

def __str__(self):
    return unicode(self.name)

Why then is a UnicodeEncodeError being raised? What can I do to fix it?

I have tried turning the string into a unicode one:

u'{}: {}.'.format(message, object)

But that messes up the object's string. It returns \xf1\xf1\xf1\xf1 instead of ññññ.

Upvotes: 0

Views: 51

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 148910

In Python 2, normal strings are byte strings. And __str__ should never return an unicode string: you are breaking the str contract. If you need unicode conversion for your object use the __unicode__ special function:

def __unicode__(self):
    return unicode(self.name)

or even better return self.name.decode(encoding) where encoding is the encoding of self.name.

And never mix unicode strings and byte strings without explicit encoding. So the correct way is:

'{}: {}.'.format(message, unicode(object).encode(encoding))

Here again, encoding represents what you want for the external representation. Common encodings are Latin1 or cp1252 on Windows, and often utf-8 on Linux

Upvotes: 3

Happy Boy
Happy Boy

Reputation: 536

I recommend function decode and encode, as follow:

class A(object):
    def __str__(self):
        return "速度快".decode("utf-8", "ignore")

obj = A()
print u"{}".format(obj)

add u

Upvotes: 0

Related Questions