ebaharilikult
ebaharilikult

Reputation: 101

UnicodeEncodeError only with str(text) in Python

I'm reading a utf-8 encoded file. When I print the text directly, everything is fine. When i print the text from a class using msg.__str__() it works too. But I really don't know how to print it only with str(msg) because this will always raise the error "'ascii' codec can't encode character u'\xe4' in position 10: ordinal not in range(128)" if in the text is a umlaut.

Example Code:

 #!/usr/bin/env python
 # encoding: utf-8

 import codecs from TempClass import TempClass

 file = codecs.open("person.txt", encoding="utf-8") message =
 file.read() #I am Mr. Händler.

 #works
 print message

 msg = TempClass(message)
 #works
 print msg.__str__()
 #works
 print msg.get_string()

 #error
 print str(msg)

And the class:

class TempClass(object):

def __init__(self, text):
    self.text = text

def get_string(self):
    return self.text

def __str__(self):
    return self.text

I tried to decode and encode the text in several ways but nothing works for me.

Help? :)

Edit: I am using Python 2.7.9

Upvotes: 0

Views: 424

Answers (1)

rezca
rezca

Reputation: 116

Because message (and msg.text) are not str but unicode objects. To call str() you need to specify utf-8 as the encoding again. Your __str__ method should look like:

def __str__(self):
    return self.text.encode('utf-8')

unicode can be implicitly encoded to str if it contains only ASCII characters, which is why you only see the error when the input contains an umlaut.

Upvotes: 1

Related Questions