Reputation: 91
could someone assist me with an issue, I am trying to scrape the the dish name with a tag MUST TRY but I don't know why it is printing the list of all dishes
CODE :
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(executable_path='./chromedriver.exe')
driver.get("https://www.zomato.com/pune/bedekar-tea-stall-sadashiv-peth/order")
screen_height = driver.execute_script("return window.screen.height;") # get the screen height of the web
i = 1
count = 0
scroll_pause_time = 1
while True:
# scroll one screen height each time
driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))
i += 1
time.sleep(scroll_pause_time)
# update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
scroll_height = driver.execute_script("return document.body.scrollHeight;")
# Break the loop when the height we need to scroll to is larger than the total scroll height
if (screen_height) * i > scroll_height:
break
driver.execute_script("window.scrollTo(0, 0);")
#block of code where i am struggling with
dish_divs = driver.find_elements_by_xpath("//div[@class = 'sc-1s0saks-11 cYGeYt']")
for items in dish_divs:
if items.find_element(By.XPATH, "//div[contains(text(),'MUST TRY')]"):
name = items.find_element(By.CSS_SELECTOR,'h4.sc-1s0saks-15.iSmBPS')
print(name.text)
else:
continue
driver.close()
OUTPUT :
['Misal Slice', 'Shev Chivda', 'Kharvas [Sugar]', 'Extra Rassa [1 Vati]', 'Taak', 'Extra Slice', 'Misal Slice', 'Kharvas [Jaggery]', 'Solkadhi', 'Kokam', 'Nimboo Sharbat', 'Shev Chivda', 'Batata Chivda', 'Misal Slice', 'Extra Kanda [1 Vati]', 'Extra Slice', 'Extra Rassa [1 Vati]', 'Coffee Kharvas', 'Rose Kharvas', 'Shengdana Ladoo', 'Chirota', 'Kharvas [Sugar]', 'Kharvas [Jaggery]', 'Chocolate Fudge', 'Taak', 'Kokam', 'Flavored Milk', 'Nimboo Sharbat', 'Solkadhi', 'Dahi']
EXPECTED OUTPUT : the list of dishes with musttry tag like in image below. My script is getting all the names not the selected ones
Upvotes: 1
Views: 123
Reputation: 10666
Here items.find_element(By.XPATH, "//div[contains(text(),'MUST TRY')]")
you're using absolute XPath (search all elements from the root
). In fact you need relative XPath (search only in the current element):
items.find_element(By.XPATH, ".//div[contains(text(),'MUST TRY')]")
You can get same result using a single XPath:
//div[div/div[@type="tag"][.="MUST TRY"]]/preceding-sibling::h4[1]/text()
Also I don't recommend you to parse HTML using Selenium. It's really slow for this. I recommend to use lxml or beautifulsoup.
You can use above XPath like this:
from lxml import html
....
content = driver.page_source
tree = html.fromstring(content)
titles = tree.xpath('//div[div/div[@type="tag"][.="MUST TRY"]]/preceding-sibling::h4[1]/text()')
Upvotes: 0
Reputation: 29362
just try this xpath :
//div[text()='MUST TRY']/../../../h4
and use in code like this :
for name in driver.find_elements(By.XPATH, "//div[text()='MUST TRY']/../../../h4"):
print(name.text)
Upvotes: 1
Reputation: 665
Instead of
dish_divs = driver.find_elements_by_xpath("//div[@class = 'sc-1s0saks-11 cYGeYt']")
for items in dish_divs:
if items.find_element(By.XPATH, "//div[contains(text(),'MUST TRY')]"):
name = items.find_element(By.CSS_SELECTOR,'h4.sc-1s0saks-15.iSmBPS')
print(name.text)
else:
continue
You can use
dish_divs = driver.find_elements_by_xpath('//div[@class="sc-1s0saks-1 dpXgPd"]/preceding-sibling::h4')
for items in dish_divs:
print(items.text)
This will make your code more readable and easy to maintain
Upvotes: 1