UnicodeEncodeError when __str__ already returns unicode

Question

We have the following formatted string:

'{}: {}.'.format(message, object)

Which raises:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

The object's string is non-ascii, but the method is overriden so that it returns a unicode string:

def __str__(self):
    return unicode(self.name)

Why then is a UnicodeEncodeError being raised? What can I do to fix it?

I have tried turning the string into a unicode one:

u'{}: {}.'.format(message, object)

But that messes up the object's string. It returns \xf1\xf1\xf1\xf1 instead of ññññ.

Serge Ballesta · Accepted Answer

In Python 2, normal strings are byte strings. And __str__ should never return an unicode string: you are breaking the str contract. If you need unicode conversion for your object use the __unicode__ special function:

def __unicode__(self):
    return unicode(self.name)

or even better return self.name.decode(encoding) where encoding is the encoding of self.name.

And never mix unicode strings and byte strings without explicit encoding. So the correct way is:

'{}: {}.'.format(message, unicode(object).encode(encoding))

Here again, encoding represents what you want for the external representation. Common encodings are Latin1 or cp1252 on Windows, and often utf-8 on Linux

UnicodeEncodeError when str already returns unicode

Answers (2)

Related Questions