Reputation: 424
I'm parsing some Internet-shop pages with the list of <div class="classname" ...>
tgs. By 24 on each page, for example.
But some elements loaded in time, some no. WebDriver(Chrome) find takes 4-6 elements normally loaded like this:
<div class="classname">
<div class="abcd">...</div>
</div>
and 18-20 like
<div class="classname" ...><!-- --></div>
- not loaded
So i use driver.find_elements_by_class("abcd")
, and it get only 4-6 elements.
How to wait total list loaded using the WebDriverWait.until
or implicity_wait
?
(Where is no any another elements, that can to be waited. All other parts of the page loads fully and correctly.)
Or how to simply delay some seconds without conditions and get finish version of page in WebDriver object? (driver.iImplicity_wait(10)
) - delays as i see, but not full data in webdriver object too.)
Upadate: It`s strange for me, but using of webdriver.wait, time.sleep(), drver.refresh() do not update drive.page_source of the page. That still stay in not loaded correctly statment... Code:
self.driver.get(url_)
time.sleep(15)
number_of_elements = len(self.driver.find_elements_by_class_name("product-cards-layout__item")) # len -24
while True:
xpath = "//div[@class=\"product-card--mobile\"]"
condition = EC.presence_of_all_elements_located((By.XPATH, xpath))
try:
wait = WebDriverWait(self.driver, 10).until(condition) # len - 6
except Exception:
pass
print(len(wait)) #6
if len(wait) == number_of_elements:
break
else:
self.driver.refresh()
exit_ = self.driver.page_source
So. In driver.page_sorce is html-code bellow:
<div class="product-cards-layout__item"><div class="product-card--mobile__info"</div></div>
<div class="product-cards-layout__item"><div class="product-card--mobile__info"</div></div>
<div class="product-cards-layout__item"><div class="product-card--mobile__info"</div></div>
... (6 times)
<div class="product-cards-layout__item"><!-- --></div>
<div class="product-cards-layout__item"><!-- --></div>
<div class="product-cards-layout__item"><!-- --></div>
... (20 times)
Total 24 TAGS
But in Chrome opened window i see all need information (24 full TAGS, within construction <div class="product-card--mobile__info"). In run script mode, in debugger mode i see static contain of the .page_source...
And if i don't use .refresh()
all the same - it`s staic, and not correspont to data in the browser. And it still static for the hundrets loops )))
Upvotes: 0
Views: 850
Reputation: 424
My problem was: the site don't load block while it not in focus in browser. Desision is - scroll to all div`s your need:
self.driver.get(url_)
product_elements = self.driver.find_elements_by_class_name("product-cards")
for elm in product_elements:
elm.location_once_scrolled_into_view
Upvotes: 0
Reputation: 573
If you know the number of elements per page you can use this function to wait until all expected elements:
from selenium.common.exceptions import TimeoutException, StaleElementReferenceException
def wait_until_all_expected_elements(func, number_of_elements, timeout=30):
endtime = time.time() + timeout
while True:
try:
if time.time() > endtime:
raise TimeoutException("The function doesn't return a sufficient number of elements")
elements = func()
if len(elements) == number_of_elements:
return elements
except StaleElementReferenceException:
pass
where number_of_elements stands for the number of elements that the page contains. Then, get the elements with WebdriverWait.until
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
def get_elements(driver):
wait = WebDriverWait(driver, 10)
return wait.until(ec.presence_of_all_elements_located((By.XPATH, path_to_element))
and pass the function to wait_until_all_expected_elements as follows:
elements = wait_until_all_expected_elements(lambda: get_elements(driver), number_of_elements)
Upvotes: 1
Reputation: 58
I've had the same issue with web scraping. Try using python's built in time library. You just put time.sleep(number_of_seconds) and the site will have time to load and then you can look for what you need.
import time
driver.get(your_website_here)
time.sleep(5) # Wait 5 seconds for page to fully load
driver.find_elements_by_class("abcd")
Upvotes: 0