solopiu
solopiu

Reputation: 756

Load entire html page in python

I need to store in a str variable an entire html page. I'm doing this:

import requests
from bs4 import BeautifulSoup

url = my_url
response = requests.get(url)
page = str(BeautifulSoup(response.content))

This works but the page in my_url is not "complete". It is a website in which going to the end, new things will load, and i need all the page, not only the main visible part.

Is there a way to load the entire page and then store it?

I also tried to load the page manually and then looking at the source code, but the final part of the page is still not visible.

Alternatively, all I want from my_url page are all the links inside it, and all of them are like:

my_url/something/first-post
my_url/something/second-post

Is there a way to find all the links in another way? So, all the possible url that starts with "my_url/something/"

Thanks in advance

Upvotes: 0

Views: 1438

Answers (2)

NoSkillMan
NoSkillMan

Reputation: 136

I think you should use Selenium and then scroll down with it to get entire the page.

as I know requests can't handle dynamic pages.

Upvotes: 1

Nico Müller
Nico Müller

Reputation: 1874

For the alternative option, you can find the <a> tags via find_all

links = soup.find_all('a')

to get all starting with you can use the following

result = [link for link  in links if link.startswith('my_url/something/')]

Upvotes: 0

Related Questions