When I use request.get() get the wrong answer

Question

I don't know too much about web. I only want to download all the zip files from a web page with a python script. But when I made the request.get() I only got a pre-page with code to load the real page (that is what I think). Is there any way to load the correct content?. My pipe line overview is:

Load the page with request.get(),
Pass the info to Beautifulsoup4 to obtain all the urls to download.

The web page is link

I could copy the html web info directly from the DOM, but, I really want to know what I'm doing wrong with the request command :(

page =requests.get('https://divvy-tripdata.s3.amazonaws.com/index.html')
soup = BeautifulSoup(page.content)
soup.prettify()
print(soup)

What I got:





Bucket loading...








Bucket loading...



Name
Date Modified
Size
Type

StandardIO · Accepted Answer

After read the discussion provided by @colidyre.

I use requests_html library to request the get petition. This library download chromium web explorer to the pc when you use render method for first time. This method execute the javascript in chromium to render the page completely.

This library has two version class for this endevour:

syncronous
asyncronous.

I had to use the async one. The syncronous version can be found in the docs.

This is simple and other methods implies to install a server but for me that was an overkill because this is not a frecuent operation to me.

# Libraries

from requests_html import AsyncHTMLSession
import requests

# Session and request

asession = AsyncHTMLSession()
r = await asession.get('https://divvy-tripdata.s3.amazonaws.com/index.html')
await r.html.arender(sleep=1) # The sleep arg is necessary I don't know why...
r.close()

# Processing and saving to a file the links

links = r.html.links

dir_path = "data/"
path_file = dir_path + "url_files.txt" 

with  open(path_file, mode='w') as url_files:
    for link in links:
        if link.split('.')[-1] == 'zip':
            url_files.write(link + '
')

# Download data

with open(path_file, mode='r') as url_file:
    for link in url_file:
        link = link[0:-1] # rid the 
 character
        response = requests.get(link)
        file_name = link.split('/')[-1]
        with open(dir_path + file_name, mode='wb') as zipfile:
            zipfile.write(response.content)
        print(f'succcesful downloaded file: {file_name}')

When I use request.get() get the wrong answer

Answers (1)

Related Questions