Download pdf files without .pdf url

Question

I am trying to download PDF files from this website.

I am new to Python and am currently learning about the software. I have downloaded packages such as urllib and bs4. However, there is no .pdf extension in any of the URLs. Instead, each one has the following format: http://www.smv.gob.pe/ConsultasP8/documento.aspx?vidDoc={.....}.

I have tried to use the soup.find_all command. However, this was not successful.

from urllib import request
from bs4 import BeautifulSoup
import re
import os
import urllib

url="http://www.smv.gob.pe/frm_hechosdeImportanciaDia?data=38C2EC33FA106691BB5B5039DACFDF50795D8EC3AF"
response = request.urlopen(url).read()
soup= BeautifulSoup(response, "html.parser")    
links = soup.find_all('a', href=re.compile(r'(http://www.smv.gob.pe/ConsultasP8/documento.aspx?)'))
print(links)

Download pdf files without .pdf url

Answers (1)

Related Questions