Reputation: 1487
I'm trying to make a bot that send me an email once a new product is online on a website.
I tried to do that with requests and beautifulSoup.
This is my code :
import requests
from bs4 import BeautifulSoup
URL = 'https://www.vinted.fr/vetements?search_text=football&size_id[]=207&price_from=0&price_to=15&order=newest_first'
headers = {'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"}
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
products = soup.find_all("div", class_="c-box")
print(len(products))
Next, I'll want to compare the number of products before and after my new request in a loop.
But when I try to see the number of products that I found, I get an empty list : []
I don't know how to fix that ...
The div that I use is in others div, I don't know if it has a relation
Thanks by advance
Upvotes: 1
Views: 240
Reputation: 31
You should always check the data.
Convert your BeautifulSoup object to string with soup.decode('utf-8')
and write it on a file. Then check what you get from the website. In this case, there is no element with c-box class.
You should use selenium
instead of requests
.
Upvotes: 1
Reputation: 570
You have problem with the website that you are trying to parse.
The website in your code generates elements you are looking for(div.c-box
) after the website is fully loaded, using javascript, at the client-side. So it's like:
Browser gets HTML source from server --(1)--> JS files loaded as browser loads html source --> JS files add elements to the HTML source --(2)--> Those elements are loaded to the browser
You cannot fetch the data you want by requests.get
because requests.get
method can only get HTML source at point (1), but the website loads the data at (2) point. To fetch such data, you should use automated browser modules such as selenium
.
Upvotes: 1