Downloading a CSV that requires authentication from an email link

Question

I'm a very junior developer, tasked with automating the creation, download and transformation of a query from Stripe Sigma.

I've been able to get the bulk of my job done: I have daily scheduled queries that generate a report for the prior 24 hours, which is linked to a dummy account purely for those reports, and I've got the Transformation and reports done on the back half of this problem.

The roadblock I've run into though is getting this code to pull the csv that manually clicking the link generates.

import re
from imbox import Imbox # pip install imbox
import traceback
import requests
from bs4 import BeautifulSoup
mail = Imbox(host, username=username, password=password, ssl=True, ssl_context=None, starttls=False)

messages = mail.messages(unread=True)
message_list = []
for (uid, message) in messages:
    body = str(message.body.get('html'))
    message_list.append(body)

mail.logout()

def get_download_link(message):
    print(message[0])
    soup = BeautifulSoup(message, 'html.parser')
    
    urls = []     
    for link in soup.find_all('a'):
        print(link.get('href'))
        urls.append(link.get('href'))
    return urls[1]
    # return urls  

dl_urls = []       
for m in message_list:
     dl_urls.append(get_download_link(m))

for url in dl_urls: print(url) try: s = requests.Session() s.auth = (username, password) response = s.get(url, allow_redirects=True, auth= (username, password)) # print(response.headers) if (response.status_code == requests.codes.ok): print('response headers', response.headers['content-type']) response = requests.get(url, allow_redirects=True, auth= HTTPDigestAuth(username, password)) # print(response.text) print(response.content) # open(filename, 'wb').write(response.content) else: print("invalid status code",response.status_code) except: print('problem with url', url)

I'm working on this in jupyter notebooks, I've tried to just include relevant code detailing how I got into the email, how I extracted URLS from said email, and which one upon being clicked would download the csv.

All the way til the last step, I've had remarkably good luck, but now, the URL that I manually click downloads the csv as expected, however that same URL is being treated as the HTML for a stripe page by python/requests.

I've tried poking around in the headers, the one header that was suggested on another post ('Content-Disposition') wasn't present, and the printing the headers that are present takes up a good 20-25 lines.

Any suggestions on either headers that could contain the csv, or other approaches I would take would be appreciated.

I've included a (intentionally broken) URL to show the rough format of what is working for manual download, not working when kept in python entirely.

https://59.email.stripe.com/CL0/https:%2F%2Fdashboard.stripe.com%2Fscheduled_query_runs%xxxxxxxxxxxxxxxx%2Fdownload/1/xxxxxxxxxxxx-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx-000000/xxxxxxxxx-xxxxx_xxxxxxxxxxxxxxxx=233

Nolan H · Accepted Answer

If you're using scheduled queries, you can receive notification about completion/availability as webhooks and then access the files programmatically using the url included in the event payload. You can also list/retrieve scheduled queries via the API and check their status before accessing the data at the file link.

There should be no need to parse these links out of an email.

Downloading a CSV that requires authentication from an email link

Answers (2)

Related Questions