Reputation: 91

not able scrape the element which contain the a text using selenium

could someone assist me with an issue, I am trying to scrape the the dish name with a tag MUST TRY but I don't know why it is printing the list of all dishes

CODE :

import time
from selenium import webdriver  
from selenium.webdriver.common.by import By


driver = webdriver.Chrome(executable_path='./chromedriver.exe')
driver.get("https://www.zomato.com/pune/bedekar-tea-stall-sadashiv-peth/order")
screen_height = driver.execute_script("return window.screen.height;")  # get the screen height of the web
i = 1
count = 0
scroll_pause_time = 1

while True:
    # scroll one screen height each time
    driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))
    i += 1
    time.sleep(scroll_pause_time)
    # update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
    scroll_height = driver.execute_script("return document.body.scrollHeight;")
    # Break the loop when the height we need to scroll to is larger than the total scroll height
    if (screen_height) * i > scroll_height:
        break

driver.execute_script("window.scrollTo(0, 0);")

#block of code where i am struggling with

dish_divs = driver.find_elements_by_xpath("//div[@class = 'sc-1s0saks-11 cYGeYt']")
for items in dish_divs:
    if items.find_element(By.XPATH, "//div[contains(text(),'MUST TRY')]"):

        name = items.find_element(By.CSS_SELECTOR,'h4.sc-1s0saks-15.iSmBPS')
        print(name.text)
    else:
        continue
driver.close()

OUTPUT :

['Misal Slice', 'Shev Chivda', 'Kharvas [Sugar]', 'Extra Rassa [1 Vati]', 'Taak', 'Extra Slice', 'Misal Slice', 'Kharvas [Jaggery]', 'Solkadhi', 'Kokam', 'Nimboo Sharbat', 'Shev Chivda', 'Batata Chivda', 'Misal Slice', 'Extra Kanda [1 Vati]', 'Extra Slice', 'Extra Rassa [1 Vati]', 'Coffee Kharvas', 'Rose Kharvas', 'Shengdana Ladoo', 'Chirota', 'Kharvas [Sugar]', 'Kharvas [Jaggery]', 'Chocolate Fudge', 'Taak', 'Kokam', 'Flavored Milk', 'Nimboo Sharbat', 'Solkadhi', 'Dahi']

EXPECTED OUTPUT : the list of dishes with musttry tag like in image below. My script is getting all the names not the selected ones

Upvotes: 1

Answers (3)

gangabass

Reputation: 10666

Here items.find_element(By.XPATH, "//div[contains(text(),'MUST TRY')]") you're using absolute XPath (search all elements from the root). In fact you need relative XPath (search only in the current element):

items.find_element(By.XPATH, ".//div[contains(text(),'MUST TRY')]")

You can get same result using a single XPath:

//div[div/div[@type="tag"][.="MUST TRY"]]/preceding-sibling::h4[1]/text()

Also I don't recommend you to parse HTML using Selenium. It's really slow for this. I recommend to use lxml or beautifulsoup.

You can use above XPath like this:

from lxml import html

....
content = driver.page_source
tree = html.fromstring(content)

titles = tree.xpath('//div[div/div[@type="tag"][.="MUST TRY"]]/preceding-sibling::h4[1]/text()')

Upvotes: 0

cruisepandey

Reputation: 29372

just try this xpath :

//div[text()='MUST TRY']/../../../h4

and use in code like this :

for name in driver.find_elements(By.XPATH, "//div[text()='MUST TRY']/../../../h4"):
    print(name.text)

Upvotes: 1

theNishant

Reputation: 665

Instead of

dish_divs = driver.find_elements_by_xpath("//div[@class = 'sc-1s0saks-11 cYGeYt']")
for items in dish_divs:
    if items.find_element(By.XPATH, "//div[contains(text(),'MUST TRY')]"):

        name = items.find_element(By.CSS_SELECTOR,'h4.sc-1s0saks-15.iSmBPS')
        print(name.text)
    else:
        continue

You can use

dish_divs = driver.find_elements_by_xpath('//div[@class="sc-1s0saks-1 dpXgPd"]/preceding-sibling::h4')
for items in dish_divs:
    print(items.text)

This will make your code more readable and easy to maintain

Upvotes: 1

not able scrape the element which contain the a text using selenium

Answers (3)

Related Questions