PyCoderL1
PyCoderL1

Reputation: 29

Problems with requests Python 3 retrieving an excel file from a WP site

Here's my problem, I am trying to download excel xlsx files from a WP site, if I type

the url I assigned to a variable in my code, called stock, directly in browser, Firefox downloads it perfectly.

I'm trying to do this with Python so I've made a script using requests and then Pandas for processing and manipulation.

However even though the file seems to download it returns an error, I tried using both open and with open as suggested on similar problems I've found here, but in my case it returns an error 'ValueError: Seek of closed file', I attempted several variations to the code, with no result, the outcome was always the error.

Here is my code

import pandas as pd
import requests, os
import http.client

http.client.HTTPConnection._http_vsn = 10
http.client.HTTPConnection._http_vsn_str = 'HTTP/1.0'

# Url of the same link I used to manually fetch the file
stock = 'https://filmar.com/wp-content/uploads/2021/05/Apple-Lot-5-14-21.xlsx'

resp = requests.get(stock)  # passed the GET method to the http request with the URL

print("Downloading...") # This works

# When I try to retrieve the file it fails 
with open('Apple-Lot-5-14-21.xlsx', 'wb') as output:
    output.write(resp.content)
    
print('The file has been downloaded') # this is printed

# The error happens when I try to assign the file to the pd.read_excel method in Pandas
apple = pd.read_excel(output)

Addendum

After entering the code resp - objectprovided by @MattDMo, apparently there's a permission problem or something, because upon analysis of the response object, models.response it returned a 404, not found, so either it's a protection or some redirection that takes place on the server, so requests retrieves an empty file.

Upvotes: 0

Views: 900

Answers (1)

MattDMo
MattDMo

Reputation: 102862

You can't pass output to pd.read_excel(), because when the with context manager exits, the reference to the file (output) is destroyed. One option here, if you don't really need to save the Excel file for anything else, is to pass resp.content directly to read_excel(). Alternatively, if you want the Excel file for backup or other purposes, create a filename variable like so:

xl_file = 'Apple-Lot-5-14-21.xlsx'

then use that variable both when you're calling with open(... and when you're calling read_excel(), as that function can take both file names and file-like objects.

As an extra note, I'm not sure why you're using http.client, as requests doesn't look at any of those values to my knowledge.

Upvotes: 1

Related Questions