Reputation: 22440
I've written a script in python to scrape names from a slow loading webpage. There are 1000 names in that page and the full content can only be loaded when the browser is made to scroll downmost. However, my script can successfully reach the lowest portion of this page and parse all the names. The issue I'm facing here is that I've used hardcoded delay which is 5 seconds in this case and it makes the browser unnecessarily wait even when the item is loaded. So how can i use explicit wait to overcome this situation and parse all the item.
Here is the script I've written so far:
from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get("http://fortune.com/fortune500/list/")
check_height = driver.execute_script("return document.body.scrollHeight;")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
height = driver.execute_script("return document.body.scrollHeight;")
if height == check_height:
break
check_height = height
listElements = driver.find_elements_by_css_selector(".company-title")
for item in listElements:
print(item.text)
Upvotes: 2
Views: 124
Reputation: 52665
You can add Explicit wait as below:
from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://fortune.com/fortune500/list/")
check_height = driver.execute_script("return document.body.scrollHeight;")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
try:
WebDriverWait(driver, 10).until(lambda driver: driver.execute_script("return document.body.scrollHeight;") > check_height)
check_height = driver.execute_script("return document.body.scrollHeight;")
except:
break
listElements = driver.find_elements_by_css_selector(".company-title")
for item in listElements:
print(item.text)
This should allow you to avoid hardcoding time.sleep()
- instead you're just waiting for changing height
value or break the loop in case height
is constant after 10 seconds passed after scrolling...
Upvotes: 1
Reputation: 621
You need to use explicit waits, like this:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)
finally:
driver.quit()
More details here http://selenium-python.readthedocs.io/waits.html
Upvotes: 0