mike01010
mike01010

Reputation: 6048

Unable to programmatically download xls file

i can manually download this file by pasting the url in a browser: https://www.aaii.com/files/surveys/sentiment.xls

However, when i try to programmatically do this, i have no luck. Depending on the library i use (requests, urlib, urlib3), the error is either 403 or simply some html with text 'request unsuccessful' is returned. What's strange is that it worked a few times - i was able to download the excel file. then it would stop without any coding changing. it's quite strange and sporadic.

Wondering if someone can try this code to see if they have the same issue or can see if there is anything i am doing incorrectly

UPDATE: seems if i wait a while and try running the code once more, it works. Its as though the server may have limit on number of request in a given timeframe. would be good if someone can see if that is happening to them also

import pandas as pd
import requests

url="https://www.aaii.com/files/surveys/sentiment.xls"

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
    'Accept': '.xls,.xlsx,application/csv,application/excel,application/vnd.msexcel,application/vnd.ms-excel,application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
    'Accept-Encoding': 'gzip, deflate',
    'Accept-Language': 'en-US,en;q=0.9',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'DNT': '1'
    }

resp = requests.get(url=url, headers=headers)
data = resp.content
print(data)
with open('test.xls', 'wb') as output:
    output.write(data)

df=pd.read_excel(data)
# df=pd.read_excel(url, header=headers)

Upvotes: 0

Views: 174

Answers (1)

René
René

Reputation: 4827

Your code seems to work for me. However, when I ran it a second time, I got this error message:

IOPub data rate exceeded. The Jupyter server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable --ServerApp.iopub_data_rate_limit.

Current values: ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec) ServerApp.rate_limit_window=3.0 (secs)

It seems the server you are downloading from has set a date_rate_limit.

Starting your notebook from the shell with:
jupyter notebook --NotebookApp.iopub_data_rate_limit=1.0e10.
solved the issue for me.

Upvotes: 1

Related Questions