smith324
smith324

Reputation: 13060

Unicode characters with BlobStore in App Engine

Is there a way to store unicode data with App Engine's BlobStore (in Python)?

I'm saving the data like this

file_name = files.blobstore.create(mime_type='application/octet-stream')
with files.open(file_name, 'a') as f:
     f.write('<as><a>' + '</a><a>'.join(stringInUnicode) + '</a></as>')

But on the production (not development) server I'm getting this error. It seems to be converting my Unicode into ASCII and I don't know why.

Why is it trying to convert back to ASCII? Can I avoid this?

    Traceback (most recent call last):
 File "/base/data/home/apps/myapp/1.349473606437967000/myfile.py", line 137, in get
   f.write('<as><a>' + '</a><a>'.join(stringInUnicode) + '</a></as>')
 File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 364, in write
   self._make_rpc_call_with_retry('Append', request, response)
 File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 472, in _make_rpc_call_with_retry
   _make_call(method, request, response)
 File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 226, in _make_call
   rpc.make_call(method, request, response)
 File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 509, in make_call
   self.__rpc.MakeCall(self.__service, method, request, response)
 File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 115, in MakeCall
   self._MakeCallImpl()
 File "/base/python_runtime/python_lib/versions/1/google/appengine/runtime/apiproxy.py", line 161, in _MakeCallImpl
   self.request.Output(e)
 File "/base/python_runtime/python_lib/versions/1/google/net/proto/ProtocolBuffer.py", line 204, in Output
   self.OutputUnchecked(e)
 File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file_service_pb.py", line 2390, in OutputUnchecked
   out.putPrefixedString(self.data_)
 File "/base/python_runtime/python_lib/versions/1/google/net/proto/ProtocolBuffer.py", line 432, in putPrefixedString
   v = str(v)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 313: ordinal not in range(128)

Upvotes: 2

Views: 714

Answers (1)

bobince
bobince

Reputation: 536557

A BLOB store contains binary data: bytes, not characters. So you're going to have to do an encode step of some sort. utf-8 seems as good an encoding as any.

f.write('<as><a>' + '</a><a>'.join(stringInUnicode) + '</a></as>')

This will go wrong if an item in stringInUnicode contains <, & or ]]> sequences. You'll want to do some escaping (either using a proper XML library to serialise the data, or manually):

with files.open(file_name, 'a') as f:
    f.write('<as>')
    for line in stringInUnicode:
        line= line.replace(u'&', u'&amp;').replace(u'<', u'&lt;').replace(u'>', u'&gt;');
        f.write('<a>%s</a>' % line.encode('utf-8'))
    f.write('</as>')

(This will still be ill-formed XML if the strings ever include control characters, but there's not so much you can do about that. If you need to store arbitrary binary in XML you'd need some ad-hoc encoding such as base-64 on top.)

Upvotes: 5

Related Questions