Michael Soulier
Michael Soulier

Reputation: 821

baffled by python __str__ method and unicode behaviour

This is Python 2.7. Don't judge me. :)

I have a Django application where the class file is being used for a bit of auditing. For historical reasons I have a __str__ method in this class, and I am trying to return something useful.

   def __str__(self):
      return "%s %s" % (self.guid, self.setside_username)

Now, this failed with non-ascii characters in the setside_username, as the audit log was indirectly calling __str__ like so

log.info("change to %s", obj)

I tried renaming __str__ to __unicode__ but it still failed in the same location. So I tried sanitize the string by ascii encoding it and having the encoder replace anything it didn't understand.

def __str__(self):
   return "%s %s" % (self.guid, self.setside_username.encode('ascii', 'replace')

but that line fails with a UnicodeDecodeError, which baffles me because I thought that call would tell the encoder to replace anything it doesn't understand.

So to prove that I don't understand the difference between encode() and decode(), I s/encode/decode and suddenly the error is gone.

And I haven't a clue why. I thought decode created unicode objects and encode created byte strings, so why would decode on a unicode object help here?

Worse, my little test script that simply prints the object using a print statement is now failing!

    username = self.setside_username.decode('ascii', 'replace')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)

So I fix one use case but break another.

I need to understand this to know that I'm actually fixing the problem and not just playing whack-a-mole until the backtraces go away.

Help appreciated.

Update: Moving to __unicode__ methods that return unicode.

Still seeing this.

Traceback (most recent call last):
  File "/usr/lib64/python2.6/logging/__init__.py", line 784, in emit
    msg = self.format(record)
  File "/usr/lib64/python2.6/logging/__init__.py", line 662, in format
    return fmt.format(record)
  File "/usr/lib64/python2.6/logging/__init__.py", line 444, in format
    record.message = record.getMessage()
  File "/usr/lib64/python2.6/logging/__init__.py", line 314, in getMessage
    msg = msg % self.args
  File "/etc/e-smith/web/django/teleworker/clients/models.py", line 364, in __unicode__
    return u"%s %s" % (self.guid, self.setside_username)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
[03/Aug/2017 12:59:03.907] ERROR [MainThread] [tug-eventd.tug-eventd:1727] Error handling cluster event: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

Upvotes: 1

Views: 942

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1123520

You need to give your model a __unicode__ method that actually returns Unicode:

def __unicode__(self):
   return u"%s %s" % (self.guid, self.setside_username)

Note the u prefix, we used a Unicode literal, and we did not encode the username.

The Django model baseclass provides a __str__ method that'll take the __unicode__ output and encode it to a bytestring for you.

Upvotes: 2

Related Questions