Reputation: 7004
I have a link to a PDF file that I would like to download. I tried the following:
import requests
class Scraper:
def __init__(self):
"""Init the class"""
@staticmethod
def download(full_url):
"""Download full url pdf"""
with requests.Session() as req:
# Init
r = req.get(full_url, allow_redirects=True)
localname = 'test.pdf'
# Download
if r.status_code == 200: #and r.headers['Content-Type'] == "application/pdf;charset=UTF-8":
with open(f"{localname}", 'wb') as f:
f.write(r.content)
else:
pass
However, after downloading, when I try to open it on my computer I receive the message:
"Could not open [FILENAME].pdf because it is either not a supported file type or because the file has been damaged (...)"
Upvotes: 1
Views: 1019
Reputation: 11515
Actually you haven't passed the required
parameters for starting the download
, as if you have navigate to the url, you will see that you need to Click
continue
in order to start the download. what's happening in the bacground is GET
request to the back-end with the following parameters
?switchLocale=y&siteEntryPassthrough=true
to start the download
.
You can view that under developer-tools
within your browser and navigate to the Network-Tab
section.
import requests
params = {
'switchLocale': 'y',
'siteEntryPassthrough': 'true'
}
def main(url, params):
r = requests.get(url, params=params)
with open("test.pdf", 'wb') as f:
f.write(r.content)
main("https://www.blackrock.com/uk/individual/literature/annual-report/blackrock-index-selection-fund-en-gb-annual-report-2019.pdf", params)
Upvotes: 2