Reputation: 39
Goal: Download CSV files from a website link directly to the file download.
I have gone through dozens of threads using different methods to download CSV files. Every method leaves me with the same broken format of a excel file that does not contain the original information but some code.
I have tried using these methods with other links from other websites and it has worked perfectly, making me think there is something different about these excel files from this specific website that causes the problem.
My current code (one of many different versions, all yielding same result):
import requests
import shutil
import datetime
import csv
req = requests.get('https://cranedata.com/publications/download/mfi-daily-data/issue/2020-09-11/.csv', stream=True)
url_content = req.content
if req.status_code == 200:
print(req.status_code == requests.codes.ok)
print(requests.Response.content)
csv_file = open('MFID200911 .csv', 'wb')
csv_file.write(url_content)
csv_file.close()
I do not believe there is an issue as I have 200 and true as outputs for req and req.status_code == requests.codes.ok
This yields a excel file that looks like this:https://prnt.sc/ugx7bv
Instead of the one I see when manually downloading the file from the website: https://prnt.sc/ugx7u4
My end goal is to download all the CSV files in a loop as only the date changes on the link, however right now I just need to get one file to download correctly.
Edit: This is the code after implementing the loop
web = Browser()
web.go_to('https://cranedata.com/')
web.type(username , into='username')
web.type(password , into='password')
web.click('Login' , tag='login')
sdate = date(2009, 1, 1) # start date
edate = date(2020, 9, 15) # end date
delta = edate - sdate # as timedelta
dates = [datetime.datetime(2009,4,6)+datetime.timedelta(dval) for dval in range(delta.days+1)];
for dateval in dates:
web.go_to('https://cranedata.com/publications/download/mfi-daily-data/issue/' +dateval.strftime('%Y-%m-%d') + '/csv')
Upvotes: 0
Views: 14764
Reputation: 2019
You can use twill or mechanize packages, as exemplified here to get the file directly after login.
Or you can use an automation tool, such as web bot to simulate a user navigation:
from webbot import Browser
username = 'your_username'
password = 'your_password'
web = Browser()
web.go_to('https://cranedata.com/')
web.type(username , into='username')
web.type(password , into='password')
web.click('Login' , tag='login')
web.go_to('https://cranedata.com/publications/download/mfi-daily-data/issue/2020-09-11/.csv')
Upvotes: 0