Reputation: 77
I've been roaming around these forums asking questions about issues related to Python and UTF-8 encoding/decoding.
This time around I've stumbled upon something which initially seemed an easy problem.
In my previous question (http://stackoverflow.com/questions/7138797/problems-with-python-in-google-app-engine-utf-8-and-ascii) I asked how to ensure proper addition of UTF-8 strings to variables:
Messages.append(ChatMessage(chatter, msg))
The solution was something along those lines:
Messages.append(ChatMessage(chatter.encode( "utf-8" ), msg.encode( "utf-8" )))
Pretty simple.
However, now I am faced with the challenge to send the data to Google App Engine Datastore. The code from the book I was using (Code in the Cloud)looked as follows (I skipped the redundant parts):
#START: ChatMessage
class ChatMessage(db.Model):
user = db.StringProperty(required=True)
timestamp = db.DateTimeProperty(auto_now_add=True)
message = db.TextProperty(required=True)
def __str__(self):
return "%s (%s): %s" % (self.user, self.timestamp, self.message)
#END: ChatMessage
# START: PostHandler
class ChatRoomPoster(webapp.RequestHandler):
def post(self):
chatter = self.request.get("name")
msgtext = self.request.get("message")
msg = ChatMessage(user=chatter, message=msgtext)
msg.put() #<callout id="co.put"/>
self.redirect('/')
# END: PostHandler
I thought that swaping a part of the PostHandler with the following bit:
msg = ChatMessage(user=chatter.encode( "utf-8" ), message=msgtext.encode( "utf-8" ))
... would do the trick. Unfortunately, that did not happen. I still keep getting
File "/base/data/home/apps/s~markcc-chatroom-one-pl/1.353054484690143927/pchat.py", line 147, in post
msg = ChatMessage(user=chatter.encode( "utf-8" ), message=msgtext.encode( "utf-8" ))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)
Naturally, I declared (# -- coding: utf-8 --) statement and put:
self.response.headers['Content-Type'] = 'text/html; charset=UTF-8'
in the file. It does nothing to alleviate the issue.
As you can see I am not very well-versed in Python, and encoding/decoding problems are, for me, a bit of novelty. I would appreciate your assistance. If anyonone could explain to me where I went wrong in this case and what practices to use to avoid similar quandaries in the future? Thank you in advance.
Upvotes: 1
Views: 628
Reputation: 40330
encode
turns unicode into bytes, and decode
turns bytes into unicode. You have to be careful not to mix the two. Your error means either:
chatter
or msgtext
is already bytes, and you are trying to encode it. One of the worst 'features' of Python 2 is that it lets you do this - it tries to first decode the bytes using ascii (the most limited encoding), and then re-encode them with whatever you've asked for. This is fixed in Python 3, but you can't use that on App Engine.
App Engine expects to store unicode (it does). So you need to pass it a unicode string without encoding it. In fact, if your data is already in a bytestring, you would need to decode it before you can store it.
In short, the first thing to try is simply not calling .encode
before you store the data.
(I may have pointed you to it before, but if not, please take the time to read this article about unicode)
Upvotes: 3