CoMartel
CoMartel

Reputation: 3591

Request returns partial page

I'm trying to parse data of a website that loads when user scroll. There is a finite number of element that can appears while scrolling, but using this only gives the first part (25 out of 112):

url = "http://url/to/website"
response = requests.get(url)
soup = BeautifulSoup(response.text)

How can I tell request to "scroll" before returning the html?

EDIT : apparently request don't do that, what solution can I use in Python?

Upvotes: 2

Views: 1087

Answers (2)

Kir Chou
Kir Chou

Reputation: 3080

The only thing you should know is how serverlet works.

Usually, onScroll or onClick or any other event will trigger AJAX request to the server. And the client side javascript will render those return (JSON/XML...) So the only thing you should do is to repeat those AJAX request to the same server to get those data.

For example, the action in browser will like below:

1. Enter url on browser   
> [HTTP GET REQUEST] http://url/to/website

2. Scroll on the page
> [AJAX GET] http://url/to/website/1
> [javascript on front-end will process those data]

3. Then, keeping scrolling on the page
> [AJAX GET] http://url/to/website/2
> [javascript on front-end will process those data]

4. ... (and so on)

Q. How to use python to get those data?

A. One simple way is using browser > inspect > network_tab to find what AJAX request you send when you scroll in that page. And repeat those AJAX request with correspond header by python.

Upvotes: 2

Daniel Roseman
Daniel Roseman

Reputation: 599530

You can't. The question is based on a misunderstanding of what requests does; it loads the content of the page only. Endless scrolling is powered by Javascript, which requests won't do anything with.

You'd need some browser automation tools like Selenium to do this; or find out what Ajax endpoint the scrolling JS is using and load that directly.

Upvotes: 5

Related Questions