Reputation: 106390
I'm running into an issue dealing with non-ascii POST parameters. Here's a CURL request that shows the problem:
curl "http://localhost:8000/api/txt/" -d \
"sender=joe&comments=Bus%20%A3963.33%20London%20to%20Sydney"
The pound sign in comments
is causing the issue: when I try to do just about anything with request.POST['comments']
I get:
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 4: ordinal not in range(128)
For example, if I just try to log what comments
is:
message = request.POST.get('comments', None)
file('/tmp/comments.txt', 'wb').write(message)
I get the above error. Or when I try to decode it, I get the same error:
try:
message = message.decode('ISO-8859-2','ignore').encode('utf-8','ignore')
except Exception, e:
file('/tmp/ERROR-decode.txt','w').write(str(e))
produces ERROR-decode.txt
with:
'ascii' codec can't encode character u'\ufffd' in position 4: ordinal not in range(128)
Ideas?
Upvotes: 2
Views: 2988
Reputation: 799580
%A3
is wrong. It should in fact be %C2%A3
or %C5%81
in order to be correct UTF-8.
Also, "Unicode In Python, Completely Demystified".
Upvotes: 2
Reputation: 143935
I think you have to pass it first into urllib.unquote() to remove the quoting performed by HTTP, then, you can convert the string to unicode with the proper encoding
>>> unicode(urllib.unquote("Bus%20%A3963.33%20London%20to%20Sydney"), \
"iso-8859-2").encode("utf-8")
'Bus \xc5\x81963.33 London to Sydney'
Upvotes: 0