Reputation: 3809
I have a unicode string that was encoded on the client side using JS encodeURIComponent.
If I use the following in Python locally, I get the expected result:
>>> urllib.unquote("Foo%E2%84%A2%20Bar").decode("utf-8")
>>> u'Foo\u2122 Bar'
But when I run this in Google App Engine, I get:
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 703, in __call__
handler.post(*groups)
File "/base/data/home/apps/s~kaon-log/2.357769827131038147/main.py", line 143, in post
path_uni = urllib.unquote(h.path).decode('utf-8')
File "/base/python_runtime/python_dist/lib/python2.5/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-5: ordinal not in range(128)
I'm still using Python 2.5, in case that makes a difference. What am I missing?
Upvotes: 11
Views: 8164
Reputation: 1
Use this class instead of urllib2:
class URL(object):
'''encode/decode requested URL'''
def __init__(self):
self.charset = '/abcdefghijklmnopqrstuvwxyz-ABCDEFGHIJKLMNOPQRSTUVWXYZ_0123456789.'
def encode(self,url):
return ''.join([c if c in self.charset else '%'+c.encode('hex').upper() for c in url])
def decode(self,url):
return re.compile('%([0-9a-fA-F]{2})',re.M).sub(lambda m: chr(int(m.group(1),16)), url)
Example:
import re
URL().decode(your URL)
Upvotes: 0
Reputation: 14487
My guess is that h.path
is a unicode object. Then urllib.unquote
would return a unicode object. When decode
is called on a unicode object at first it is converted to str
using default encoding (which is ascii) and here you get the 'ascii' codec can't encode
exception.
Here is a proof:
>>> urllib.unquote(u"Foo%E2%84%A2%20Bar").decode("utf-8")
...
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-5: ordinal not in range(128)
This should work:
urllib.unquote(h.path.encode('utf-8')).decode("utf-8")
There is a stackoverflow thread which explains why unicode doesn't work with urllib.unquote
: How to unquote a urlencoded unicode string in python?
Upvotes: 10