Reputation: 756
I need to store in a str variable an entire html page. I'm doing this:
import requests
from bs4 import BeautifulSoup
url = my_url
response = requests.get(url)
page = str(BeautifulSoup(response.content))
This works but the page in my_url is not "complete". It is a website in which going to the end, new things will load, and i need all the page, not only the main visible part.
Is there a way to load the entire page and then store it?
I also tried to load the page manually and then looking at the source code, but the final part of the page is still not visible.
Alternatively, all I want from my_url page are all the links inside it, and all of them are like:
my_url/something/first-post
my_url/something/second-post
Is there a way to find all the links in another way? So, all the possible url that starts with "my_url/something/"
Thanks in advance
Upvotes: 0
Views: 1438
Reputation: 136
I think you should use Selenium and then scroll down with it to get entire the page.
as I know requests
can't handle dynamic pages.
Upvotes: 1
Reputation: 1874
For the alternative option, you can find the <a> tags
via find_all
links = soup.find_all('a')
to get all starting with you can use the following
result = [link for link in links if link.startswith('my_url/something/')]
Upvotes: 0