scraping issue (dynamic content)(without selenium)

Question

I need to scrape http://www.vintagetoday.be/fr/montres but it has dynamic content.

How can I do this?

my code

import requests from bs4 import BeautifulSoup t = requests.get("vintagetoday.be/fr/catalogue.awp").text print(len(BeautifulSoup(t, "lxml").findAll("td", {"class":"Lien2"})))

results is 16 but thera are 430 articles

Ayoub_B · Accepted Answer

It's normal that you're getting just 16 links instead of 430, when the page is loaded for the first time it only comes with the first 16 watches (links) in order to get more you need to scroll down the page and more watches will appear, To achieve this you can use Selenium.

A better method will be to reverse the AJAX call they are using to load the watches (paginate) and use this call directly in your code. A quick look shows that they call the following URL to load more watches (POST):

http://www.vintagetoday.be/fr/montres?AWPIDD9BBA1F0=27045E7B002DF1FE7C1BA8D48193FD1E54B2AAEB

I don't see any parameter that indicates the pagination tho, which means it's stored in the session, they also send some query string parameter with the request's body, so you need to check that as well.

The return value seems to be in XML, which will be straightforward to get the URLs from.

scraping issue (dynamic content)(without selenium)

my code

results is 16 but thera are 430 articles

Answers (2)

Related Questions