Reputation: 6048
i can manually download this file by pasting the url in a browser: https://www.aaii.com/files/surveys/sentiment.xls
However, when i try to programmatically do this, i have no luck. Depending on the library i use (requests, urlib, urlib3), the error is either 403 or simply some html with text 'request unsuccessful' is returned. What's strange is that it worked a few times - i was able to download the excel file. then it would stop without any coding changing. it's quite strange and sporadic.
Wondering if someone can try this code to see if they have the same issue or can see if there is anything i am doing incorrectly
UPDATE: seems if i wait a while and try running the code once more, it works. Its as though the server may have limit on number of request in a given timeframe. would be good if someone can see if that is happening to them also
import pandas as pd
import requests
url="https://www.aaii.com/files/surveys/sentiment.xls"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
'Accept': '.xls,.xlsx,application/csv,application/excel,application/vnd.msexcel,application/vnd.ms-excel,application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.9',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'DNT': '1'
}
resp = requests.get(url=url, headers=headers)
data = resp.content
print(data)
with open('test.xls', 'wb') as output:
output.write(data)
df=pd.read_excel(data)
# df=pd.read_excel(url, header=headers)
Upvotes: 0
Views: 174
Reputation: 4827
Your code seems to work for me. However, when I ran it a second time, I got this error message:
IOPub data rate exceeded. The Jupyter server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable
--ServerApp.iopub_data_rate_limit
.Current values: ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec) ServerApp.rate_limit_window=3.0 (secs)
It seems the server you are downloading from has set a date_rate_limit
.
Starting your notebook from the shell with:
jupyter notebook --NotebookApp.iopub_data_rate_limit=1.0e10
.
solved the issue for me.
Upvotes: 1