Downloading CSV files directly from website link

Question

Goal: Download CSV files from a website link directly to the file download.

I have gone through dozens of threads using different methods to download CSV files. Every method leaves me with the same broken format of a excel file that does not contain the original information but some code.

I have tried using these methods with other links from other websites and it has worked perfectly, making me think there is something different about these excel files from this specific website that causes the problem.

My current code (one of many different versions, all yielding same result):

import requests
import shutil
import datetime
import csv

req = requests.get('https://cranedata.com/publications/download/mfi-daily-data/issue/2020-09-11/.csv', stream=True)
url_content = req.content
if req.status_code == 200:
    print(req.status_code == requests.codes.ok)
    print(requests.Response.content)
    csv_file = open('MFID200911 .csv', 'wb')
    csv_file.write(url_content)
    csv_file.close()

I do not believe there is an issue as I have 200 and true as outputs for req and req.status_code == requests.codes.ok

This yields a excel file that looks like this:https://prnt.sc/ugx7bv

Instead of the one I see when manually downloading the file from the website: https://prnt.sc/ugx7u4

My end goal is to download all the CSV files in a loop as only the date changes on the link, however right now I just need to get one file to download correctly.

Edit: This is the code after implementing the loop

 web = Browser()
web.go_to('https://cranedata.com/')
web.type(username , into='username')
web.type(password , into='password')
web.click('Login' , tag='login')

sdate = date(2009, 1, 1)   # start date
edate = date(2020, 9, 15)   # end date
delta = edate - sdate       # as timedelta
dates = [datetime.datetime(2009,4,6)+datetime.timedelta(dval) for dval in range(delta.days+1)];


for dateval in dates:
    web.go_to('https://cranedata.com/publications/download/mfi-daily-data/issue/' +dateval.strftime('%Y-%m-%d') + '/csv')

Daniel Labbe · Accepted Answer

You can use twill or mechanize packages, as exemplified here to get the file directly after login.

Or you can use an automation tool, such as web bot to simulate a user navigation:

from webbot import Browser 
username = 'your_username'
password = 'your_password'
web = Browser()
web.go_to('https://cranedata.com/') 
web.type(username , into='username')
web.type(password , into='password') 
web.click('Login' , tag='login')
web.go_to('https://cranedata.com/publications/download/mfi-daily-data/issue/2020-09-11/.csv')

Downloading CSV files directly from website link

Answers (1)

Related Questions