Reputation: 13060
Is there a way to store unicode data with App Engine's BlobStore (in Python)?
I'm saving the data like this
file_name = files.blobstore.create(mime_type='application/octet-stream')
with files.open(file_name, 'a') as f:
f.write('<as><a>' + '</a><a>'.join(stringInUnicode) + '</a></as>')
But on the production (not development) server I'm getting this error. It seems to be converting my Unicode into ASCII and I don't know why.
Why is it trying to convert back to ASCII? Can I avoid this?
Traceback (most recent call last):
File "/base/data/home/apps/myapp/1.349473606437967000/myfile.py", line 137, in get
f.write('<as><a>' + '</a><a>'.join(stringInUnicode) + '</a></as>')
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 364, in write
self._make_rpc_call_with_retry('Append', request, response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 472, in _make_rpc_call_with_retry
_make_call(method, request, response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 226, in _make_call
rpc.make_call(method, request, response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 509, in make_call
self.__rpc.MakeCall(self.__service, method, request, response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 115, in MakeCall
self._MakeCallImpl()
File "/base/python_runtime/python_lib/versions/1/google/appengine/runtime/apiproxy.py", line 161, in _MakeCallImpl
self.request.Output(e)
File "/base/python_runtime/python_lib/versions/1/google/net/proto/ProtocolBuffer.py", line 204, in Output
self.OutputUnchecked(e)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file_service_pb.py", line 2390, in OutputUnchecked
out.putPrefixedString(self.data_)
File "/base/python_runtime/python_lib/versions/1/google/net/proto/ProtocolBuffer.py", line 432, in putPrefixedString
v = str(v)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 313: ordinal not in range(128)
Upvotes: 2
Views: 714
Reputation: 536557
A BLOB store contains binary data: bytes, not characters. So you're going to have to do an encode step of some sort. utf-8
seems as good an encoding as any.
f.write('<as><a>' + '</a><a>'.join(stringInUnicode) + '</a></as>')
This will go wrong if an item in stringInUnicode
contains <
, &
or ]]>
sequences. You'll want to do some escaping (either using a proper XML library to serialise the data, or manually):
with files.open(file_name, 'a') as f:
f.write('<as>')
for line in stringInUnicode:
line= line.replace(u'&', u'&').replace(u'<', u'<').replace(u'>', u'>');
f.write('<a>%s</a>' % line.encode('utf-8'))
f.write('</as>')
(This will still be ill-formed XML if the strings ever include control characters, but there's not so much you can do about that. If you need to store arbitrary binary in XML you'd need some ad-hoc encoding such as base-64 on top.)
Upvotes: 5