Gaborio
Gaborio

Reputation: 19

Requests: Download an excel file using python 3 (invalid literal for int() with base 16)

I am new to python, and I am learning to use it to scrape some data for me, but I cannot download an excel file for some reason that I don't understand. I want to download this excel file, when I open this link in any browser it tries to save an excel file:

http://www5.registraduria.gov.co/CuentasClarasPublicoCon2014/Consultas/Candidato/Formulario5xls/2

based on a previous question (see downloading an excel file from the web in python) I'm using requests in python 3 as this:

import requests, os


url="http://www5.registraduria.gov.co/CuentasClarasPublicoCon2014/Consultas/Candidato/Formulario5xls/2"

print("Downloading...")
requests.get(url)
output = open('test.xls', 'wb')
output.write(resp.content)
output.close()
print("Done!")

I think that the problem is not with the part of the code that writes the data since the test.xls is being created but as an empty file. the requests.get gives me the following error (followed bu several more):

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 417, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

I also tried using the urllib but still failed.

Upvotes: 1

Views: 9563

Answers (1)

Paul Rooney
Paul Rooney

Reputation: 21609

Seems like this is a known issue.

One way to workaround it is to use http 1.0. To do this set the httplib variables _http_vsnand _http_vsn_str like so.

For Python 2

import requests, os
import httplib

httplib.HTTPConnection._http_vsn = 10
httplib.HTTPConnection._http_vsn_str = 'HTTP/1.0'

url="http://www5.registraduria.gov.co/CuentasClarasPublicoCon2014/Consultas/Candidato/Formulario5xls/2"

print("Downloading...")
resp = requests.get(url)
with open('test.xls', 'wb') as output:
    output.write(resp.content)
print("Done!")

For Python 3 httplib was renamed to http.client So the code becomes

import requests, os
import http.client

http.client.HTTPConnection._http_vsn = 10
http.client.HTTPConnection._http_vsn_str = 'HTTP/1.0'

url="http://www5.registraduria.gov.co/CuentasClarasPublicoCon2014/Consultas/Candidato/Formulario5xls/2"

print("Downloading...")
resp = requests.get(url)
with open('test.xls', 'wb') as output:
    output.write(resp.content)
print("Done!")

Upvotes: 2

Related Questions