Reputation: 8203

Send a non-ASCII POST request in Python?

I'm trying to send a POST request to a web app. I'm using the mechanize module (itself a wrapper of urllib2). Anyway, when I try to send a POST request, I get UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128). I tried putting the unicode(string), the unicode(string, encoding="utf-8"), unicode(string).encode() etc, nothing worked - either returned the error above, or the TypeError: decoding Unicode is not supported

I looked at the other SO answers to similar questions, but none helped.

Thanks in advance!

EDIT: Example that produces an error:

prda = "šđćč" #valid UTF-8 characters
prda # typing in python shell 
'\xc5\xa1\xc4\x91\xc4\x87\xc4\x8d'
print prda # in shell
šđćč
prda.encode("utf-8") #in shell
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128)
unicode(prda)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128)

Upvotes: 7

Answers (3)

ekhumoro

Reputation: 120638

In your example, you use a non-unicode string literal containing non-ascii characters, which results in prda becoming a bytes string.

To achieve this, python uses sys.stdin.encoding to automatically encode the string. In your case, this means the string is gets encoded as "utf-8".

To convert prda to a unicode object, you need to decode it using the appropriate encoding:

>>> print prda.decode('utf-8')
šđćč

Note that, in a script or module, you cannot rely on python to automatically guess the encoding - you would need to explicitly delare the encoding at the top of the file, like this:

# -*- coding: utf-8 -*-

Whenever you encounter unicode errors in Python 2, it is very often because your code is mixing bytes strings with unicode strings. So you should always check what kind of string is causing the error, by using type(string).

If the string object is <type 'str'>, but you need unicode, decode it using the appropriate encoding. If the string object is <type 'unicode'>, but you need bytes, encode it using the appropriate encoding.

Upvotes: 1

Giacomo Lacava

Reputation: 1823

You don't need to wrap your chars in unicode calls, because they're already encoded :) if anything, you need to DE-code it to get a unicode object:

>>> s = '\xc5\xa1\xc4\x91\xc4\x87\xc4\x8d'   # your string
>>> s.decode('utf-8')
u'\u0161\u0111\u0107\u010d'
>>> type(s.decode('utf-8'))
<type 'unicode'>

I don't know mechanize so I don't know exactly whether it handles it correctly or not, I'm afraid.

What I'd do with a regular urllib2 POST call, would be to use urlencode :

>>> from urllib import urlencode
>>> postData = urlencode({'test': s })   # note I'm NOT decoding it
>>> postData
'test=%C5%A1%C4%91%C4%87%C4%8D'
>>> urllib2.urlopen(url, postData)   # etc etc etc

Upvotes: 0

Laurence Gonsalves

Reputation: 143224

I assume you're using Python 2.x.

Given a unicode object:

myUnicode = u'\u4f60\u597d'

encode it using utf-8:

mystr = myUnicode.encode('utf-8')

Note that you need to specify the encoding explicitly. By default it'll (usually) use ascii.

Upvotes: 9

Send a non-ASCII POST request in Python?

Answers (3)

Related Questions