Reputation: 8203
I'm trying to send a POST request to a web app. I'm using the mechanize module (itself a wrapper of urllib2). Anyway, when I try to send a POST request, I get UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128)
. I tried putting the unicode(string)
, the unicode(string, encoding="utf-8")
, unicode(string).encode()
etc, nothing worked - either returned the error above, or the TypeError: decoding Unicode is not supported
I looked at the other SO answers to similar questions, but none helped.
Thanks in advance!
EDIT: Example that produces an error:
prda = "šđćč" #valid UTF-8 characters
prda # typing in python shell
'\xc5\xa1\xc4\x91\xc4\x87\xc4\x8d'
print prda # in shell
šđćč
prda.encode("utf-8") #in shell
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128)
unicode(prda)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128)
Upvotes: 7
Views: 6569
Reputation: 120638
In your example, you use a non-unicode string literal containing non-ascii characters, which results in prda
becoming a bytes string.
To achieve this, python uses sys.stdin.encoding
to automatically encode the string. In your case, this means the string is gets encoded as "utf-8".
To convert prda
to a unicode object, you need to decode it using the appropriate encoding:
>>> print prda.decode('utf-8')
šđćč
Note that, in a script or module, you cannot rely on python to automatically guess the encoding - you would need to explicitly delare the encoding at the top of the file, like this:
# -*- coding: utf-8 -*-
Whenever you encounter unicode errors in Python 2, it is very often because your code is mixing bytes strings with unicode strings. So you should always check what kind of string is causing the error, by using type(string)
.
If the string object is <type 'str'>
, but you need unicode, decode it using the appropriate encoding. If the string object is <type 'unicode'>
, but you need bytes, encode it using the appropriate encoding.
Upvotes: 1
Reputation: 1823
You don't need to wrap your chars in unicode
calls, because they're already encoded :) if anything, you need to DE-code it to get a unicode object:
>>> s = '\xc5\xa1\xc4\x91\xc4\x87\xc4\x8d' # your string
>>> s.decode('utf-8')
u'\u0161\u0111\u0107\u010d'
>>> type(s.decode('utf-8'))
<type 'unicode'>
I don't know mechanize
so I don't know exactly whether it handles it correctly or not, I'm afraid.
What I'd do with a regular urllib2
POST call, would be to use urlencode
:
>>> from urllib import urlencode
>>> postData = urlencode({'test': s }) # note I'm NOT decoding it
>>> postData
'test=%C5%A1%C4%91%C4%87%C4%8D'
>>> urllib2.urlopen(url, postData) # etc etc etc
Upvotes: 0
Reputation: 143224
I assume you're using Python 2.x.
Given a unicode object:
myUnicode = u'\u4f60\u597d'
encode it using utf-8:
mystr = myUnicode.encode('utf-8')
Note that you need to specify the encoding explicitly. By default it'll (usually) use ascii.
Upvotes: 9