Reputation: 22440
I've written a script in python in combination with selenium to parse all the coffee shop names available in a webpage. The webpage has got lazyloading method active so I can see 40 names in each scroll. If I scroll 2 times then the number of names visible are 80 and so on.
There are 125 names available in that webpage. My below script can reach the bottom of that page handling all the scroll but can't break out of the loop in order to print the content.
This is my script so far:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 4)
driver.get("https://www.yellowpages.ca/search/si/1/coffee/all%20states")
itemlist = []
while True:
for elem in wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME,"listing__name--link"))):
if elem.text not in itemlist:
itemlist.append(elem.text)
try:
driver.execute_script("arguments[0].scrollIntoView();",elem)
except Exception:break
for item in itemlist:
print(item)
driver.quit()
The content of that page do not generate dynamically so I could fetch them all using requests
only changing the number of this portion /si/1/coffee/
of the url. However, I would like to fetch them using selenium controlling scroll.
Postscript: I do not wish to solve with driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
or for item in range(3): elem.send_keys(Keys.END)
as I've already got success using them.
All I need to know as to how can I break out of the loop putting any condition within.
Upvotes: 1
Views: 242
Reputation: 52665
You can try to implement following condition: break the loop if the number of entries remains the same within timeout:
itemlist = []
while True:
for elem in wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME,"listing__name--link"))):
if elem.text not in itemlist:
itemlist.append(elem.text)
current_len = len(driver.find_elements_by_class_name("listing__name--link"))
try:
driver.execute_script("arguments[0].scrollIntoView();",elem)
wait.until(lambda driver: len(driver.find_elements_by_class_name("listing__name--link")) > current_len)
except Exception:break
for item in itemlist:
print(item)
driver.quit()
Upvotes: 3
Reputation: 5463
Within the while True
loop keep a boolean variable done
set to True. Set it to false whenever you add an item to the list.
Outside the loop break if done = True
.
Upvotes: 0