Reputation: 1
I am trying to download all files from this website https://superbancos.gob.pa/es/fin-y-est/reportes-estadisticos
I found this code on a page and I am trying to adapt it to my process
If you could help me I would appreciate it
#Aqui importe las librerias
import requests
from bs4 import BeautifulSoup
# specify the URL of the archive here
archive_url = "https://www.superbancos.gob.pa/es/fin-y-est/reportes-estadisticos"
def get_video_links():
r = requests.get(archive_url)
soup = BeautifulSoup(r.content,'html5lib')
links = soup.findAll('a')
video_links = [archive_url + link['href'] for link in links if link['href'].endswith('xlsx')]
return video_links
def download_video_series(video_links):
for link in video_links:
'''iterate through all links in video_links
and download them one by one'''
# obtain filename by splitting url and getting
# last string
file_name = link.split('/')[-1]
print ("Downloading file:{!s}".format(file_name))
# create response object
r = requests.get(link, stream = True)
# download started
with open(file_name, 'wb') as f:
for chunk in r.iter_content(chunk_size = 1024*1024):
if chunk:
f.write(chunk)
print ("{!s} downloaded!\n".format(file_name))
print ("All files downloaded!")
return
if __name__ == "__main__":
video_links = get_video_links()
download_video_series(video_links)
but when i start the program he said All files downloaded and dont download anyone
Upvotes: 0
Views: 65
Reputation: 366
The information you are looking for is dynamically loaded with JS code. So you should use something that can run JS and render the page like you see it in the browser.
The most straightforward way is using selenium:
from bs4 import BeautifulSoup
from selenium import webdriver
def get_soup(link):
driver = webdriver.Chrome()
driver.get(link)
soup = BeautifulSoup(driver.page_source, 'html5lib')
driver.close()
return soup
So your first function could be rewritten as
def get_video_links():
soup = get_soup(archive_url)
links = soup.findAll('a')
video_links = [archive_url + link['href'] for link in links if link['href'].endswith('xlsx')]
return video_links
Just make sure to set up your ChromeDriver properly! here is the documentation.
Upvotes: 2
Reputation: 688
The Problem here is that the page is required javascript. Your best bet here is to use selenium webdriver to handle this, instead of bs4:
Upvotes: 0