Reputation: 22440
I've written some script in python using selenium to scrape name and price of different products from redmart website. My target is to click on each category among 10 in the upper side of the main page and parse all the products going to the target page. However, when a category is clicked, the browser is on newly opened page so at this point it is necessary to get to the main page again to click another one among 10 category links. My scraper clicks on a link, goes to its target page, parses data from there, gets back to the main page and clicks on the same link and does the rest over and over again. Here is the script I'm trying with:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://redmart.com/bakery")
wait = WebDriverWait(driver, 10)
while True:
try:
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "li.image-facets-pill")))
driver.find_element_by_css_selector('img.image-facets-pill-image').click()
except:
break
for elems in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "li.productPreview"))):
name = elems.find_element_by_css_selector('h4[title] a').text
price = elems.find_element_by_css_selector('span[class^="ProductPrice__"]').text
print(name, price)
driver.back()
driver.quit()
Btw, I think it is necessary to tune up the "try" and "except" block in this script to get the desired output.
Upvotes: 1
Views: 776
Reputation: 52665
You can implement simple counter that will allow you to iterate through list of categories as below:
counter = 0
while True:
try:
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "li.image-facets-pill")))
driver.find_elements_by_css_selector('img.image-facets-pill-image')[counter].click()
counter += 1
except IndexError:
break
for elems in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "li.productPreview"))):
name = elems.find_element_by_css_selector('h4[title] a').text
price = elems.find_element_by_css_selector('span[class^="ProductPrice__"]').text
print(name, price)
driver.back()
driver.quit()
Upvotes: 1