Reputation: 1
import requests
pdf_url = "https://www.alexandrina.sa.gov.au/__data/assets/pdf_file/0028/1619614/Council-Special-Meeting-Agenda-11-June-2024.pdf"
pdf_path = 'Test.pdf'
response = requests.get(pdf_url)
pdf_content = response.content
with open(pdf_path, 'wb') as pdf_file:
pdf_file.write(pdf_content)
using this code not able to download pdf because haivng 403 response but when i open it mannualy on chrome it opens and also download in my locals but when i use request module im not able to download or if use any proxy or scrape do it download but it got currupted so i cant access this pdf, can you please help what should i do?
Upvotes: 0
Views: 30
Reputation: 1506
Seems no issue in your code. I have just changed to another pdf url, it works well.
import os
import requests
save_dir = os.getcwd()
file_name = 'test.pdf'
#url = 'https://www.alexandrina.sa.gov.au/__data/assets/pdf_file/0028/1619614/Council-Special-Meeting-Agenda-11-June-2024.pdf'
url2 = 'https://bitcoin.org/bitcoin.pdf'
outfile = os.path.join(save_dir, file_name)
response = requests.get(url2, stream=True)
with open(outfile,'wb') as output:
output.write(response.content)
As someone mentioned here. The pdf source server
block downloading using code which can prevent bots.
PDF the web server is providing you with a web page intended to prevent bots from downloading data from the site.
Upvotes: 0