Reputation: 101
I'm reading a utf-8 encoded file. When I print the text directly, everything is fine. When i print the text from a class using msg.__str__()
it works too.
But I really don't know how to print it only with str(msg)
because this will always raise the error "'ascii' codec can't encode character u'\xe4' in position 10: ordinal not in range(128)" if in the text is a umlaut.
Example Code:
#!/usr/bin/env python
# encoding: utf-8
import codecs from TempClass import TempClass
file = codecs.open("person.txt", encoding="utf-8") message =
file.read() #I am Mr. Händler.
#works
print message
msg = TempClass(message)
#works
print msg.__str__()
#works
print msg.get_string()
#error
print str(msg)
And the class:
class TempClass(object):
def __init__(self, text):
self.text = text
def get_string(self):
return self.text
def __str__(self):
return self.text
I tried to decode and encode the text in several ways but nothing works for me.
Help? :)
Edit: I am using Python 2.7.9
Upvotes: 0
Views: 424
Reputation: 116
Because message
(and msg.text
) are not str
but unicode
objects. To call str()
you need to specify utf-8 as the encoding again. Your __str__
method should look like:
def __str__(self):
return self.text.encode('utf-8')
unicode
can be implicitly encoded to str
if it contains only ASCII characters, which is why you only see the error when the input contains an umlaut.
Upvotes: 1