Tabrez Ahmed
Tabrez Ahmed

Reputation: 2950

Charset/Encoding error while Inserting to Google Datastore

I know this question is an exact duplicate of at least a dozen other questions. But I posted this question helplessly only after being convinced that those questions seldom solved my problem.

Basically I want to fetch the content from a website that contains characters in various languages and insert them to datastore. But whatever I tried, the error never seems to budge.

My sample code:

class URLEntry(db.Model):
    content = db.TextProperty()

class ViewURL(webapp2.RequestHandler):  
    def get(self):      
            import urllib2
            url = "http://iitk.ac.in/"
            try:
                result = urllib2.urlopen(url)
            except urllib2.URLError, e:
                handleError(e)
            content = result.read()
            e = URLEntry(key_name=url,content=content)
            URLEntry.get_or_insert(url,content=content) #Probably this line generates the error.

throws the error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 25554: ordinal not in range(128)

Traceback:

'ascii' codec can't decode byte 0xe0 in position 25554: ordinal not in range(128)
    Traceback (most recent call last):
      File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
        rv = self.handle_exception(request, response, e)
      File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
        rv = self.router.dispatch(request, response)
      File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
        return route.handler_adapter(request, response)
      File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
        return handler.dispatch()
      File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
        return self.handle_exception(e, self.app.debug)
      File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
        return method(*args, **kwargs)
      File "/base/data/home/apps/s~govt-jobs/1.368125505627581007/checkforurls.py", line 83, in get
        URLEntry.get_or_insert(url,content=result.content)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 1362, in get_or_insert
        return run_in_transaction(txn)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/api/datastore.py", line 2461, in RunInTransaction
        return RunInTransactionOptions(None, function, *args, **kwargs)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/api/datastore.py", line 2599, in RunInTransactionOptions
        ok, result = _DoOneTry(new_connection, function, args, kwargs)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/api/datastore.py", line 2621, in _DoOneTry
        result = function(*args, **kwargs)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 1359, in txn
        entity = cls(key_name=key_name, **kwds)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 970, in __init__
        prop.__set__(self, value)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 614, in __set__
        value = self.validate(value)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 2798, in validate
        value = self.data_type(value)
      File "/python27_runtime/python27_lib/versions/1/google/appengine/api/datastore_types.py", line 1163, in __new__
        return super(Text, cls).__new__(cls, arg, encoding)
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 25554: ordinal not in range(128)

Also as suggested by other StackOverflow answers, I tried adding the following before trying to insert to datastore:

content = content.decode("ISO-8859-1") # The encoding of the page is ISO-8859-1
content = content.encode("utf-8")

But the error prevails. Please help.

Upvotes: 0

Views: 134

Answers (1)

voscausa
voscausa

Reputation: 11706

If you say decode, it will translate binary in the coding you provide. And if you use encode it is the other way around.

content = content.encode("utf-8") # translates utf-8 in binary

The datastore uses utf-8.

Look at this great blog post from Nick Johnson : http://blog.notdot.net/2010/07/Getting-unicode-right-in-Python

Upvotes: 1

Related Questions