WJA
WJA

Reputation: 7004

Downloading .pdf using requests results in corrupted file

I have a link to a PDF file that I would like to download. I tried the following:

import requests

class Scraper:

    def __init__(self):
        """Init the class"""

    @staticmethod
    def download(full_url):
        """Download full url pdf"""
        with requests.Session() as req:

            # Init
            r = req.get(full_url, allow_redirects=True)
            localname = 'test.pdf'

            # Download
            if r.status_code == 200: #and r.headers['Content-Type'] == "application/pdf;charset=UTF-8":
                with open(f"{localname}", 'wb') as f:
                    f.write(r.content)
            else:
                pass

However, after downloading, when I try to open it on my computer I receive the message:

"Could not open [FILENAME].pdf because it is either not a supported file type or because the file has been damaged (...)"

Upvotes: 1

Views: 1019

Answers (1)

Actually you haven't passed the required parameters for starting the download, as if you have navigate to the url, you will see that you need to Click continue in order to start the download. what's happening in the bacground is GET request to the back-end with the following parameters ?switchLocale=y&siteEntryPassthrough=true to start the download.

You can view that under developer-tools within your browser and navigate to the Network-Tab section.

import requests


params = {
    'switchLocale': 'y',
    'siteEntryPassthrough': 'true'
}


def main(url, params):
    r = requests.get(url, params=params)
    with open("test.pdf", 'wb') as f:
        f.write(r.content)


main("https://www.blackrock.com/uk/individual/literature/annual-report/blackrock-index-selection-fund-en-gb-annual-report-2019.pdf", params)

Upvotes: 2

Related Questions