user684970
user684970

Reputation:

How do you store a non-ASCII character in Google App Engine Datastore

I've tried no less then 5 different "solutions" and i cant get it to work, please help.

This is the error

  'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
  Traceback (most recent call last):
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 636, in __call__
    handler.post(*groups)
  File "/base/data/home/apps/elmovieplace/1.350096827241428223/script/pftv.py", line 114, in post
    movie.put()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/__init__.py", line 984, in put
    return datastore.Put(self._entity, config=config)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 455, in Put
    return _GetConnection().async_put(config, entities, extra_hook).get_result()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1219, in async_put
    for pbs in pbsgen:
  File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1070, in __generate_pb_lists
    pb = value_to_pb(value)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 239, in entity_to_pb
    return entity._ToPb()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 841, in _ToPb
    properties = datastore_types.ToPropertyPb(name, values)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore_types.py", line 1672, in ToPropertyPb
    pbvalue = pack_prop(name, v, pb.mutable_value())
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore_types.py", line 1485, in PackString
    pbvalue.set_stringvalue(unicode(value).encode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

This is the part of the code that's giving me problems.

if imdbValues[5] == 'N/A':
    movie.diector = ''
else:
    movie.director = imdbValues[5]

...

movie.put()

In this case imdbValues[5] is equal to Claudio Fäh

Upvotes: 3

Views: 2964

Answers (2)

systempuntoout
systempuntoout

Reputation: 74064

The exception is raised by this line of code:

pbvalue.set_stringvalue(unicode(value).encode('utf-8'))

When you pass a value to movie.director , that value is first converted in unicode with:

unicode(value)

then it is encoded with encode('utf-8').

The unicode() function tipically uses ASCII as default decode encoding; it means that you are safe only passing these kind of values:

  1. A unicode string
  2. A 8 bit string

Your code is probably passing a byte string with some encoding that the unicode(value) fails to decode in ASCII.

Recommendation:
if you are dealing with byte strings, you MUST know their encoding or your program will suffer this kind of encoding/decoding problem.

How to fix it:
discover the encoding used in the byte strings you are dealing with (utf-8?) and convert them in unicode strings.
If, for example, imdbValues is a list returned by some fancy Imdb python libraries that contains utf-8 encoded byte strings, you should convert them using:

 movie.director = imdbValues[5].decode('utf-8')

Upvotes: 4

tzot
tzot

Reputation: 95911

You should start using unicode for your textual data.

Wherever you get your data, they are Unicode characters encoded as bytes. The encoding could be UTF-8, or UTF-16, or Windows-1252, or ISO-8859-1 or many other encodings. If the data exist on your system, you know the encoding. If they came from a web page, the encoding is included in the response headers, and often in the beginning of the page. Using that encoding, .decode to the very useful unicode Python object and use that in your code.

Decode on input, encode (if necessary) on output. It's not necessary to encode before using the data with App Engine.

PS that answer in a Unicode-related question might be of help.

Upvotes: 2

Related Questions