Preguntador simple
Preguntador simple

Reputation: 1

Download all files from web using Python error

I am trying to download all files from this website https://superbancos.gob.pa/es/fin-y-est/reportes-estadisticos

I found this code on a page and I am trying to adapt it to my process

If you could help me I would appreciate it

#Aqui importe las librerias
import requests 
from bs4 import BeautifulSoup 
  

  
# specify the URL of the archive here 
archive_url = "https://www.superbancos.gob.pa/es/fin-y-est/reportes-estadisticos"
  
def get_video_links(): 
      
   
    r = requests.get(archive_url) 
      
  
    soup = BeautifulSoup(r.content,'html5lib') 
      
    
    links = soup.findAll('a') 
  
    video_links = [archive_url + link['href'] for link in links if link['href'].endswith('xlsx')] 
  
    return video_links 
  
  
def download_video_series(video_links): 
  
    for link in video_links: 
  
        '''iterate through all links in video_links 
        and download them one by one'''
          
        # obtain filename by splitting url and getting  
        # last string 
        file_name = link.split('/')[-1]    
  
        print ("Downloading file:{!s}".format(file_name))
          
        # create response object 
        r = requests.get(link, stream = True) 
          
        # download started 
        with open(file_name, 'wb') as f: 
            for chunk in r.iter_content(chunk_size = 1024*1024): 
                if chunk: 
                    f.write(chunk) 
          
        print ("{!s} downloaded!\n".format(file_name))
  
    print ("All files downloaded!")
    return
  
  
if __name__ == "__main__": 
  
  
    video_links = get_video_links() 
  
 
    download_video_series(video_links) 

but when i start the program he said All files downloaded and dont download anyone

Upvotes: 0

Views: 65

Answers (2)

M. Abreu
M. Abreu

Reputation: 366

The information you are looking for is dynamically loaded with JS code. So you should use something that can run JS and render the page like you see it in the browser.

The most straightforward way is using selenium:

from bs4 import BeautifulSoup
from selenium import webdriver

def get_soup(link):
    driver = webdriver.Chrome()
    driver.get(link)
    soup = BeautifulSoup(driver.page_source, 'html5lib')
    driver.close()
    return soup

So your first function could be rewritten as

def get_video_links(): 
    soup = get_soup(archive_url)
    links = soup.findAll('a') 
    video_links = [archive_url + link['href'] for link in links if link['href'].endswith('xlsx')] 
    return video_links

Just make sure to set up your ChromeDriver properly! here is the documentation.

Upvotes: 2

shiny
shiny

Reputation: 688

The Problem here is that the page is required javascript. Your best bet here is to use selenium webdriver to handle this, instead of bs4: enter image description here

Upvotes: 0

Related Questions