SIM
SIM

Reputation: 22440

Unable to collect all the shop names from a webpage

I've written a script in python to parse some names from a webpage. The items available in that webpage doesn't get displayed all at a time, rather, it is necessary to scroll to the bottom to let the webpage release few more items and again few more upon another scrolling and so on until all items are visible. The problem is the items are not located in the body that is why driver.execute_script("return document.body.scrollHeight;") this command is not working (IMO). It is located in the left sided area like a sliding container. How can I reach the bottom of that container and parse the names from this webpage? I've written almost all the codes except for controlling the lazy-load. I'm attaching an image to give you an idea what did i try to mean by calling it a sliding container.

The link to that webpage: Link

This what I've tried so far:

from selenium import webdriver; import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("replace_the_above_link")

check_height = driver.execute_script("return document.body.scrollHeight;")
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)
    height = driver.execute_script("return document.body.scrollHeight;") 
    if height == check_height: 
        break 
    check_height = height

for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".select_list h2 a"))):
    print(item.text)

driver.quit()

This is the image of that box which contains item: Click Here

Currently my scraper is parsing items which are visible when the page is loaded.

Upvotes: 2

Views: 98

Answers (1)

Andersson
Andersson

Reputation: 52675

Below code should allow you to make XHR requests by scrolling container as much time as possible and then scrape required data:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("https://www.weedsta.com/dispensaries/in/california")

entries_count = len(wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, "select_list"))))

while True:
    driver.find_element_by_class_name("tel").send_keys(Keys.END)
    try:
        wait.until(lambda driver: entries_count < len(driver.find_elements_by_class_name("select_list")))
    except:
        break


for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".select_list h2 a"))):
    print(item.text)

driver.quit()

Upvotes: 3

Related Questions