SIM
SIM

Reputation: 22440

My scraper fails to get all the items from a webpage

I've written some code in python in combination with selenium to parse different product names from a webpage. There are few load more buttons visible if the browser is made to scroll downward. The webpage displays it's full content if the page is made to scroll downmost until there is no load more button to click. My scraper seems to be doing good but I'm not getting all the results. There are around 200 products in that page but I'm getting 90 out of them. What change should I bring about in my scraper to get them all? Thanks in advance.

The webpage I'm dealing with: Page_Link

This is the script I'm trying with:

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("put_above_url_here")
wait = WebDriverWait(driver, 10)

page = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,".listing_item")))
for scroll in range(17):
    page.send_keys(Keys.PAGE_DOWN)
    time.sleep(2)
    try:
        load = driver.find_element_by_css_selector(".lm-btm")
        load.click()
    except Exception:
        pass

for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[id^=item_]"))):
    name = item.find_element_by_css_selector(".pro-name.el2").text
    print(name)
driver.quit()

Upvotes: 2

Views: 218

Answers (2)

Darshan Jadav
Darshan Jadav

Reputation: 484

You should only Use Selenium as a last resort.

A simple look around in the webpage showed the API it called to get your data.

It returns a JSON output with all the details:

Link

You can now just loop over and store in a dataframe easily.

Very fast, fewer errors than selenium.

Upvotes: 0

Andersson
Andersson

Reputation: 52675

Try below code to get required data:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://www.purplle.com/search?q=hair%20fall%20shamboo")
wait = WebDriverWait(driver, 10)

header = driver.find_element_by_tag_name("header")
driver.execute_script("arguments[0].style.display='none';", header)

while True:

    try:
        page = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".listing_item")))
        driver.execute_script("arguments[0].scrollIntoView();", page)
        page.send_keys(Keys.END)
        load = wait.until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT, "LOAD MORE")))
        driver.execute_script("arguments[0].scrollIntoView();", load)
        load.click()
        wait.until(EC.staleness_of(load))
    except:
        break

for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[id^=item_]"))):
    name = item.find_element_by_css_selector(".pro-name.el2").text
    print(name)
driver.quit()

Upvotes: 3

Related Questions