Reputation: 2950
I know this question is an exact duplicate of at least a dozen other questions. But I posted this question helplessly only after being convinced that those questions seldom solved my problem.
Basically I want to fetch the content from a website that contains characters in various languages and insert them to datastore. But whatever I tried, the error never seems to budge.
My sample code:
class URLEntry(db.Model):
content = db.TextProperty()
class ViewURL(webapp2.RequestHandler):
def get(self):
import urllib2
url = "http://iitk.ac.in/"
try:
result = urllib2.urlopen(url)
except urllib2.URLError, e:
handleError(e)
content = result.read()
e = URLEntry(key_name=url,content=content)
URLEntry.get_or_insert(url,content=content) #Probably this line generates the error.
throws the error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 25554: ordinal not in range(128)
Traceback:
'ascii' codec can't decode byte 0xe0 in position 25554: ordinal not in range(128)
Traceback (most recent call last):
File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
rv = self.handle_exception(request, response, e)
File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
rv = self.router.dispatch(request, response)
File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
return route.handler_adapter(request, response)
File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
return handler.dispatch()
File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
return self.handle_exception(e, self.app.debug)
File "/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~govt-jobs/1.368125505627581007/checkforurls.py", line 83, in get
URLEntry.get_or_insert(url,content=result.content)
File "/python27_runtime/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 1362, in get_or_insert
return run_in_transaction(txn)
File "/python27_runtime/python27_lib/versions/1/google/appengine/api/datastore.py", line 2461, in RunInTransaction
return RunInTransactionOptions(None, function, *args, **kwargs)
File "/python27_runtime/python27_lib/versions/1/google/appengine/api/datastore.py", line 2599, in RunInTransactionOptions
ok, result = _DoOneTry(new_connection, function, args, kwargs)
File "/python27_runtime/python27_lib/versions/1/google/appengine/api/datastore.py", line 2621, in _DoOneTry
result = function(*args, **kwargs)
File "/python27_runtime/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 1359, in txn
entity = cls(key_name=key_name, **kwds)
File "/python27_runtime/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 970, in __init__
prop.__set__(self, value)
File "/python27_runtime/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 614, in __set__
value = self.validate(value)
File "/python27_runtime/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 2798, in validate
value = self.data_type(value)
File "/python27_runtime/python27_lib/versions/1/google/appengine/api/datastore_types.py", line 1163, in __new__
return super(Text, cls).__new__(cls, arg, encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 25554: ordinal not in range(128)
Also as suggested by other StackOverflow answers, I tried adding the following before trying to insert to datastore:
content = content.decode("ISO-8859-1") # The encoding of the page is ISO-8859-1
content = content.encode("utf-8")
But the error prevails. Please help.
Upvotes: 0
Views: 134
Reputation: 11706
If you say decode, it will translate binary in the coding you provide. And if you use encode it is the other way around.
content = content.encode("utf-8") # translates utf-8 in binary
The datastore uses utf-8.
Look at this great blog post from Nick Johnson : http://blog.notdot.net/2010/07/Getting-unicode-right-in-Python
Upvotes: 1