Reputation: 29
Here's my problem, I am trying to download excel xlsx files from a WP site, if I type
the url I assigned to a variable in my code, called stock, directly in browser, Firefox downloads it perfectly.
I'm trying to do this with Python so I've made a script using requests and then Pandas for processing and manipulation.
However even though the file seems to download it returns an error, I tried using both open and with open as suggested on similar problems I've found here, but in my case it returns an error 'ValueError: Seek of closed file', I attempted several variations to the code, with no result, the outcome was always the error.
Here is my code
import pandas as pd
import requests, os
import http.client
http.client.HTTPConnection._http_vsn = 10
http.client.HTTPConnection._http_vsn_str = 'HTTP/1.0'
# Url of the same link I used to manually fetch the file
stock = 'https://filmar.com/wp-content/uploads/2021/05/Apple-Lot-5-14-21.xlsx'
resp = requests.get(stock) # passed the GET method to the http request with the URL
print("Downloading...") # This works
# When I try to retrieve the file it fails
with open('Apple-Lot-5-14-21.xlsx', 'wb') as output:
output.write(resp.content)
print('The file has been downloaded') # this is printed
# The error happens when I try to assign the file to the pd.read_excel method in Pandas
apple = pd.read_excel(output)
Addendum
After entering the code resp - objectprovided by @MattDMo, apparently there's a permission problem or something, because upon analysis of the response object, models.response it returned a 404, not found, so either it's a protection or some redirection that takes place on the server, so requests retrieves an empty file.
Upvotes: 0
Views: 900
Reputation: 102862
You can't pass output
to pd.read_excel()
, because when the with
context manager exits, the reference to the file (output
) is destroyed. One option here, if you don't really need to save the Excel file for anything else, is to pass resp.content
directly to read_excel()
. Alternatively, if you want the Excel file for backup or other purposes, create a filename variable like so:
xl_file = 'Apple-Lot-5-14-21.xlsx'
then use that variable both when you're calling with open(...
and when you're calling read_excel()
, as that function can take both file names and file-like objects.
As an extra note, I'm not sure why you're using http.client
, as requests
doesn't look at any of those values to my knowledge.
Upvotes: 1