Reputation: 19
I am new to python, and I am learning to use it to scrape some data for me, but I cannot download an excel file for some reason that I don't understand. I want to download this excel file, when I open this link in any browser it tries to save an excel file:
http://www5.registraduria.gov.co/CuentasClarasPublicoCon2014/Consultas/Candidato/Formulario5xls/2
based on a previous question (see downloading an excel file from the web in python) I'm using requests in python 3 as this:
import requests, os
url="http://www5.registraduria.gov.co/CuentasClarasPublicoCon2014/Consultas/Candidato/Formulario5xls/2"
print("Downloading...")
requests.get(url)
output = open('test.xls', 'wb')
output.write(resp.content)
output.close()
print("Done!")
I think that the problem is not with the part of the code that writes the data since the test.xls is being created but as an empty file. the requests.get gives me the following error (followed bu several more):
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 417, in _update_chunk_length
self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''
I also tried using the urllib but still failed.
Upvotes: 1
Views: 9563
Reputation: 21609
Seems like this is a known issue.
One way to workaround it is to use http 1.0
. To do this set the httplib
variables _http_vsn
and _http_vsn_str
like so.
For Python 2
import requests, os
import httplib
httplib.HTTPConnection._http_vsn = 10
httplib.HTTPConnection._http_vsn_str = 'HTTP/1.0'
url="http://www5.registraduria.gov.co/CuentasClarasPublicoCon2014/Consultas/Candidato/Formulario5xls/2"
print("Downloading...")
resp = requests.get(url)
with open('test.xls', 'wb') as output:
output.write(resp.content)
print("Done!")
For Python 3 httplib
was renamed to http.client
So the code becomes
import requests, os
import http.client
http.client.HTTPConnection._http_vsn = 10
http.client.HTTPConnection._http_vsn_str = 'HTTP/1.0'
url="http://www5.registraduria.gov.co/CuentasClarasPublicoCon2014/Consultas/Candidato/Formulario5xls/2"
print("Downloading...")
resp = requests.get(url)
with open('test.xls', 'wb') as output:
output.write(resp.content)
print("Done!")
Upvotes: 2