AReddy
AReddy

Reputation: 213

Unable to download a csv file from the URL with python script

I am accessing a url with username and password to download a csv and save the file with today's date nad time, there is only one download link on the page.

Is there any way I can achieve this task via python

I am using this below script I see the print output. but how can I download the the download csv button on the web-page. Normally when I click on the download csv button it asking me to save the file.

import re
import requests
from bs4 import BeautifulSoup

url = 'https://url.com'
login_data = dict(login='[email protected]', password='password-g')
session = requests.session()

link = 'https://url.com'

r = requests.get(link)
soup = BeautifulSoup(r.text, "html.parser")

for i in soup.find_all('a', {'class': "app-btn-down"}):
    print(re.search('http://.*\b_file', i.get('href')).group(0)) # the CSV file name is b_file
    print ("r.text")

As I'm new to python, so please forgive me for my bad explanation.

Upvotes: 0

Views: 976

Answers (1)

t.m.adam
t.m.adam

Reputation: 15376

This is mostly pseudocode since i don't know the html data, but i think you'll get the idea.

First you have to submit your data to get the necessary cookies in your session (you can check the cookies with s.cookies). Keep in mind that there may be more fields that you have to submit other than login and password. Use this session for all your requests.

Then you can get the csv link with bs4 assuming it's not generated by js, otherwise you may have to use selenium.

import requests
from bs4 import BeautifulSoup
from time import gmtime, strftime
import os

s = requests.session()
url = 'https://url.com'
login_data = dict(login='[email protected]', password='password-g')
s.post(url, data=login_data)

link = 'https://url.com'
r = s.get(link)
soup = BeautifulSoup(r.text, "html.parser")

csv_link = soup.find('a', {'class':'app-btn-down', 'href':lambda h:'b_file' in h})['href']
csv_file = s.get(csv_link).text

Finally you can get the date and time with gmtime, and use strftime to format it.

date_time = strftime("%Y-%m-%d_%H-%M-%S", gmtime())
path = os.path.join('/some/dir', date_time)
with open(path, 'w') as f:
    f.write(csv_file)

Upvotes: 1

Related Questions