Reputation: 21522
meh, I'm not a fan of utf-8 in python; can't seem to figure out how to solve this. As you can see I'm already trying to B64 encode the value, but it looks like python is trying to convert it from utf-8 to ascii first...
In general I'm trying to POST form data that has UTF-8 characters with urllib2. I guess in general its the same as How to send utf-8 content in a urllib2 request? though there is no valid answer on that. I'm trying to send only a byte string by base64 encoding it.
Traceback (most recent call last):
File "load.py", line 165, in <module>
main()
File "load.py", line 17, in main
beers()
File "load.py", line 157, in beers
resp = send_post("http://localhost:9000/beers", beer)
File "load.py", line 64, in send_post
connection.request ('POST', req.get_selector(), *encode_multipart_data (data, files))
File "load.py", line 49, in encode_multipart_data
lines.extend (encode_field (name))
File "load.py", line 34, in encode_field
'', base64.b64encode(u"%s" % data[field_name]))
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/base64.py", line 53, in b64encode
encoded = binascii.b2a_base64(s)[:-1]
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 7: ordinal not in range(128)
Code:
def random_string (length):
return ''.join (random.choice (string.ascii_letters) for ii in range (length + 1))
def encode_multipart_data (data, files):
boundary = random_string (30)
def get_content_type (filename):
return mimetypes.guess_type (filename)[0] or 'application/octet-stream'
def encode_field (field_name):
return ('--' + boundary,
'Content-Disposition: form-data; name="%s"' % field_name,
'Content-Transfer-Encoding: base64',
'', base64.b64encode(u"%s" % data[field_name]))
def encode_file (field_name):
filename = files [field_name]
file_size = os.stat(filename).st_size
file_data = open(filename, 'rb').read()
file_b64 = base64.b64encode(file_data)
return ('--' + boundary,
'Content-Disposition: form-data; name="%s"; filename="%s"' % (field_name, filename),
'Content-Type: %s' % get_content_type(filename),
'Content-Transfer-Encoding: base64',
'', file_b64)
lines = []
for name in data:
lines.extend (encode_field (name))
for name in files:
lines.extend (encode_file (name))
lines.extend (('--%s--' % boundary, ''))
body = '\r\n'.join (lines)
headers = {'content-type': 'multipart/form-data; boundary=' + boundary,
'content-length': str(len(body))}
return body, headers
def send_post (url, data, files={}):
req = urllib2.Request (url)
connection = httplib.HTTPConnection (req.get_host())
connection.request ('POST', req.get_selector(), *encode_multipart_data (data, files))
return connection.getresponse()
The beer object's json is (this is the data
being passed into encode_multipart_data
):
{
"name" : "Yuengling Oktoberfest",
"brewer" : "Yuengling Brewery",
"description" : "America’s Oldest Brewery is proud to offer Yuengling Oktoberfest Beer. Copper in color, this medium bodied beer is the perfect blend of roasted malts with just the right amount of hops to capture a true representation of the style. Enjoy a Yuengling Oktoberfest Beer in celebration of the season, while supplies last!",
"abv" : 5.2,
"ibu" : 26,
"type" : "Lager",
"subtype" : "",
"color" : "",
"seasonal" : true,
"servingTemp" : "Cold",
"rating" : 3,
"inProduction": true
}
Upvotes: 0
Views: 2288
Reputation: 178264
You can't base64-encode Unicode, only byte strings. In Python 2.7, giving a Unicode string to a function that requires a byte string causes an implicit conversion to a byte string using the ascii
codec, resulting in the error you see:
>>> base64.b64encode(u'America\u2019s')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\base64.py", line 53, in b64encode
encoded = binascii.b2a_base64(s)[:-1]
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 7: ordinal not in range(128)
So encode it to a byte string using a valid encoding first:
>>> base64.b64encode(u'America\u2019s'.encode('utf8'))
'QW1lcmljYeKAmXM='
Upvotes: 4