Reputation: 22440
I've created a script using python and selenium to get all the text available out there in the following link. The webpage has got lazyloading method active and that is why more content become visible upon each scrolling. My script can handle that too.
However, the problem is when my script makes the webpage exhaust its content by reaching the bottom, it stucks right there. Once it can breaks out of the loop, I can fetch the content. How can I break out of the loop?
I know .LoadingDots
is always there. And that is the only reason I can't find any logic to break the loop.
Here is what I've tried so far: (couldn't get rid of the loop)
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
driver.get("https://www.quora.com/topic/American-Football")
while True:
try:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, ".LoadingDots")))
except Exception: break
for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".ui_qtext_rendered_qtext .ui_qtext_para"))):
print(item.text)
driver.quit()
I know I can solve the issue if I comply with the following:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
driver.get("https://www.quora.com/topic/American-Football")
last_len = len(wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".ui_qtext_rendered_qtext .ui_qtext_para"))))
while True:
for load_more in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "a[id$='_more']"))):
driver.execute_script("arguments[0].click();",load_more)
try:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
wait.until(lambda driver: len(wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".ui_qtext_rendered_qtext .ui_qtext_para")))) > last_len)
items = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".ui_qtext_rendered_qtext .ui_qtext_para")))
last_len = len(items)
except TimeoutException: break
for item in items:
print(item.text)
driver.quit()
My question is: how can i fetch the content from that page exhausting all the scrolls using the way I tried with my first script making use of .LoadingDots
?
Upvotes: 3
Views: 406
Reputation: 52665
Your script doesn't work as expected because (By.CSS_SELECTOR, ".LoadingDots")
selector returns this element <div class="LoadingDots tiny">
and it is always hidden so your expectation of its invisibility always returns True
and loop cannot be broken.
You need to check another element with "LoadingDots"
class name: <div class="LoadingDots regular">
and the logic should be following:
If after page scrolled we see no dots - break the loop
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 5)
driver.get("https://www.quora.com/topic/American-Football")
while True:
try:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".LoadingDots.regular")))
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, ".LoadingDots.regular")))
except Exception: continue
else: break
for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".ui_qtext_rendered_qtext .ui_qtext_para"))):
print(item.text)
driver.quit()
BUT! Note that I've posted this script just to point on reason why your script is not working... It's not really efficient as in case content loaded too fast (possibility is quite low, but...) script might not catch the moment when loading dots appeared and you'll not get all required content.
So @Guy solution seem to be more reliable (+1)
Upvotes: 0
Reputation: 50819
When the page is scrolled to the button the element with classes .LoadingDots.regular
remains the same, but its parent element adds new class hidden
. You can check if the class was added using get_attribute
function. You can also locate it directly with the class spinner_display_area
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
loading_dots = driver.find_element_by_class_name('spinner_display_area')
if 'hidden' in loading_dots.get_attribute('class'):
break;
Upvotes: 2