O Ganter
O Ganter

Reputation: 77

How to download a file with authentication?

I'm working with the website 'musescore.com' that has many files in the '.mxl' format that I need to download automatically with Python.

Each file on the website has a unique ID number. Here's a link to an example file:

https://musescore.com/user/43726/scores/76643

The last number in the URL is the id number for this file. I have no idea where on the website the mxl file for score is located, but I know that to download the file, one must visit this url:

https://musescore.com/score/76643/download/mxl

This link is the same for every file, but with that file's particular ID number in it. As I understand it, this url executes code that downloads the file, and is not an actual path to the file.

Here's my code:

import requests

url = 'https://musescore.com/score/76643/download/mxl'
user = 'myusername'
password = 'mypassword'

r = requests.get(url, auth=(user, password), stream=True)
with open('file.mxl', 'wb') as f:
  for chunk in r.iter_content(chunk_size=1024):
    f.write(chunk)

This code downloads a webpage saying I need to sign in to download the file. It is supposed to download the mxl file for this score. This must mean I am improperly authenticating the website. How can I fix this?

Upvotes: 0

Views: 3198

Answers (1)

cody
cody

Reputation: 11157

By passing an auth parameter to get, you're attempting to utilize HTTP Basic Authentication, which is not what this particular site uses. You'll need to use an instance of request.Session to post to their login endpoint and maintain the cookie(s) that result from that process.

Additionally, this site utilizes a csrf token that you must first extract from the login page in order to include it with your post to the login endpoint.

Here is a working example, obviously you will need to change the username and password to your own:

import requests
from bs4 import BeautifulSoup

s = requests.Session()
r = s.get('https://musescore.com/user/login')

soup = BeautifulSoup(r.content, 'html.parser')
csrf = soup.find('input', {'name': '_csrf'})['value']

s.post('https://musescore.com/user/auth/login/process', data={
    'username': '[email protected]',
    'password': 'secret',
    '_csrf': csrf,
    'op': 'Log in'
})

r = s.get('https://musescore.com/score/76643/download/mxl')

print(f"Status: {r.status_code}")
print(f"Content-Type: {r.headers['content-type']}")

Result, with content type showing it is successfully downloading the file:

Status: 200
Content-Type: application/vnd.recordare.musicxml

Upvotes: 1

Related Questions