Reputation: 3591
I'm trying to parse data of a website that loads when user scroll. There is a finite number of element that can appears while scrolling, but using this only gives the first part (25 out of 112):
url = "http://url/to/website"
response = requests.get(url)
soup = BeautifulSoup(response.text)
How can I tell request
to "scroll" before returning the html?
EDIT : apparently request don't do that, what solution can I use in Python?
Upvotes: 2
Views: 1087
Reputation: 3080
The only thing you should know is how serverlet works.
Usually, onScroll
or onClick
or any other event will trigger AJAX request
to the server. And the client side javascript will render those return (JSON/XML...) So the only thing you should do is to repeat those AJAX request to the same server to get those data.
For example, the action in browser will like below:
1. Enter url on browser
> [HTTP GET REQUEST] http://url/to/website
2. Scroll on the page
> [AJAX GET] http://url/to/website/1
> [javascript on front-end will process those data]
3. Then, keeping scrolling on the page
> [AJAX GET] http://url/to/website/2
> [javascript on front-end will process those data]
4. ... (and so on)
Q. How to use python to get those data?
A. One simple way is using browser > inspect > network_tab
to find what AJAX request you send when you scroll in that page. And repeat those AJAX request with correspond header by python.
Upvotes: 2
Reputation: 599530
You can't. The question is based on a misunderstanding of what requests does; it loads the content of the page only. Endless scrolling is powered by Javascript, which requests won't do anything with.
You'd need some browser automation tools like Selenium to do this; or find out what Ajax endpoint the scrolling JS is using and load that directly.
Upvotes: 5