SIM
SIM

Reputation: 22440

Unable to exit a loop when the browser reaches the bottom of a webpage

I've written a script in python in combination with selenium to parse all the coffee shop names available in a webpage. The webpage has got lazyloading method active so I can see 40 names in each scroll. If I scroll 2 times then the number of names visible are 80 and so on.

There are 125 names available in that webpage. My below script can reach the bottom of that page handling all the scroll but can't break out of the loop in order to print the content.

This is my script so far:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
wait = WebDriverWait(driver, 4)
driver.get("https://www.yellowpages.ca/search/si/1/coffee/all%20states")

itemlist = []
while True:
    for elem in wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME,"listing__name--link"))):
        if elem.text not in itemlist:
            itemlist.append(elem.text)

    try:
        driver.execute_script("arguments[0].scrollIntoView();",elem)
    except Exception:break

for item in itemlist:
    print(item)

driver.quit()

The content of that page do not generate dynamically so I could fetch them all using requests only changing the number of this portion /si/1/coffee/ of the url. However, I would like to fetch them using selenium controlling scroll.

Postscript: I do not wish to solve with driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") or for item in range(3): elem.send_keys(Keys.END) as I've already got success using them.

All I need to know as to how can I break out of the loop putting any condition within.

Upvotes: 1

Views: 242

Answers (2)

Andersson
Andersson

Reputation: 52665

You can try to implement following condition: break the loop if the number of entries remains the same within timeout:

itemlist = []
while True:
    for elem in wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME,"listing__name--link"))):
        if elem.text not in itemlist:
            itemlist.append(elem.text)
    current_len = len(driver.find_elements_by_class_name("listing__name--link"))
    try:
        driver.execute_script("arguments[0].scrollIntoView();",elem)
        wait.until(lambda driver: len(driver.find_elements_by_class_name("listing__name--link")) > current_len)
    except Exception:break

for item in itemlist:
    print(item)

driver.quit()

Upvotes: 3

Dakshinamurthy Karra
Dakshinamurthy Karra

Reputation: 5463

Within the while True loop keep a boolean variable done set to True. Set it to false whenever you add an item to the list.

Outside the loop break if done = True.

Upvotes: 0

Related Questions