Reputation: 53
FindAll doesn't find the class I need. However I was able to find the class above that one, but the data structure is not that well organized.
Please see the HTML below and the images.
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.vivino.com/explore?e=eJzLLbI11jNVy83MszU0UMtNrLA1MVBLrrQtLVYrsDVUK7ZNTlQrS7YtKSpNVSsviY4FioEpIwhlDKFMIJQ5VM4EAJCfGxQ='
#Opening a connection
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
#html parse
page_soup = soup(page_html, "html.parser")
container = page_soup.findAll("div", {"class":"wine-explorer__results__item"})
len(container)
Upvotes: 1
Views: 3801
Reputation: 53
Thanks everyone, as you all suggested a module to read Javascript was needed to select that class. I've used selenium in this case, however PyQt5 might be a better option.
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
from selenium import webdriver
my_url = 'https://www.vivino.com/explore?e=eJzLLbI11jNVy83MszU0UMtNrLA1MVBLrrQtLVYrsDVUK7ZNTlQrS7YtKSpNVSsviY4FioEpIwhlDKFMIJQ5VM4EAJCfGxQ='
#Opening a connection
#html parse
web_r = uReq(my_url)
driver=webdriver.Firefox()
driver.get(my_url)
page_soup = soup(web_r, "html.parser")
html = driver.execute_script("return document.documentElement.outerHTML")
#print(html)
html_page_soup = soup(html, "html.parser")
container = html_page_soup.findAll("div", {"class": "wine-explorer__results__item"})
len(container)
Upvotes: 1
Reputation: 702
Try using the following instead:
container = page_soup.findAll("div", {"class": "wine-explorer__results"})
Upvotes: 0
Reputation: 11
You can use Dryscrape module with bs4 because wine-explorer selector is created by javascript. Dryscrape module helps you for javascript support.
Upvotes: 0