Scott
Scott

Reputation: 11

Downloading a Text File from a Public Sharepoint Link using Requests in Python

I'm trying to automate the downloading of a text file from a shared link that was sent to me by email. The original link is to a folder containing two files but I got the direct download link of the file that I need which is:

https://'abc'-my.sharepoint.com/personal/gamma_'abc'/_layouts/15/download.aspx?UniqueId=a0db276e%2Ddf75%2D49b7%2Db671%2D1c49e365ef3f

When I enter the above url into a web browser I get the popup option to open or download the file. I'm trying to write some Python code to download the file automatically and this what I've come up with so far

import requests

url = "https://<abc>-my.sharepoint.com/personal/gamma_<abc>/_layouts/15" \
      "/download.aspx?UniqueId=a0db276e%2Ddf75%2D49b7%2Db671%2D1c49e365ef3f "

hdr = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:94.0) Gecko/20100101 Firefox/94.0',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
       'Accept-Encoding': 'gzip, deflate, br',
       'Accept-Language': 'en-US,en;q=0.5',
       'Upgrade-Insecure-Requests': '1',
       'Sec-Fetch-Dest': 'document',
       'Sec-Fetch-Mode': 'navigate',
       'Sec-Fetch-Site': 'none',
       'Sec-Fetch-User': '?1',
       'Connection': 'keep-alive'}

myfile = requests.get(url, headers=hdr)

open('c:/users/scott/onedrive/desktop/gamma.las', 'wb').write(myfile.content)

I originally tried without the user agent and when I opened gamma.las there was only 403 FORBIDDEN in the file. If I send the header too then the file contains HTML for what looks like a Microsoft login page, so I'm assuming that I'm missing some authentication step.

I have no affiliation with this organization - someone sent me this link via email for me to download a text file which works fine through the browser but not via Python. I don't log in to anything to get it as I have no username or password with this domain.

Am I able to do this using Requests? If not, am I able to use REST API without user credentials for this company's Sharepoint?

Upvotes: 1

Views: 762

Answers (2)

sumitkanoje
sumitkanoje

Reputation: 1245

This is how you do it

import requests, mimetypes

# Specify file sharepoint URL
file_url = 'https://organisarion-my.sharepoint.com/:b:/p/user1/Eej3XCFj7N1AqErjlxrzebgBO7NJMV797ClDPuKkBEi6zg?e=dJf2tJ'

# Specify desination filename
save_path = 'file'

# Make GET request with allow_redirect
res = requests.get(file_url, allow_redirects=True)

if res.status_code == 200:
    # Get redirect url & cookies for using in next request
    new_url = res.url
    cookies = res.cookies.get_dict()
    for r in res.history:
        cookies.update(r.cookies.get_dict())
    
    # Do some magic on redirect url
    new_url = new_url.replace("onedrive.aspx","download.aspx").replace("?id=","?SourceUrl=")

    # Make new redirect request
    response = requests.get(new_url, cookies=cookies)

    if response.status_code == 200:
        content_type = response.headers.get('Content-Type')
        print(content_type)
        file_extension = mimetypes.guess_extension(content_type)
        print(response.content)
        if file_extension:
            destination_with_extension = f"{save_path}{file_extension}"
        else:
            destination_with_extension = save_path

        with open(destination_with_extension, 'wb') as file:
            for chunk in response.iter_content(1024):
                file.write(chunk)
        print("File downloaded successfully!")
    else:
        print("Failed to download the file.")
        print(response.status_code)

A short explanation would be to GET the cookies & redirect url, use these cookies for making new GET request

Upvotes: 1

Nikolay
Nikolay

Reputation: 12245

Not supplying your credentials most probably means you are (implicitly) using built-in windows authentication in your organization. Check out if this helps: Handling windows authentication while accessing url using requests

The python library mentioned there to handle built-in windows auth is requests-negotiate-sspi. Not sure, if it's going to work with federation (your website ends with ".sharepoint.com" meaning you are probably using federation as well), but may be worth trying.

So, I would try something like this (I doubt headers really matter in your case, but you could try adding them as well)

import requests
from requests_negotiate_sspi import HttpNegotiateAuth

url = ...

myfile = requests.get(url, auth=HttpNegotiateAuth())

Upvotes: 0

Related Questions