Reputation: 111
I have a block of code, that crawls an infinite height website (Like FACEBOOK)
.
The Python selenium script asks the page javascript to go to the bottom of the page in order to load page further down. But eventually it happens that the loop runs asynchronously and the website's rate-limiter blocks the script.
I need the page to wait for the page to load first and then continue, but i failed in doing that.
The following things are what i have tried till now.
The code goes as follows :
while int(number_of_news) != int(len(news)) :
driver.execute_script("window.scrollTo(document.body.scrollHeight/2, document.body.scrollHeight);")
news = driver.find_elements_by_class_name("news-text")
print(len(news))
The output is something like
Which i interpreted as the loop being executed multiple times when the value is 43, 63... and so on
.
I also tried making it recursive, but the result is still the same. The recursive code is as follows :
def call_news(_driver, _news, _number_of_news):
_driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
_news = driver.find_elements_by_class_name("news-text")
print(len(_news))
if int(len(_news)) != int(number_of_news) :
call_news(_driver, _news, _number_of_news)
else :
return _news
Any kind of tip is appreciated.
Upvotes: 4
Views: 2123
Reputation: 50909
You can set the page_load_timeout
to make the driver wait for the page to load
driver.set_page_load_timeout(10)
Another option is to wait for the number of elements to change
current_number_of_news = 0
news = []
while int(number_of_news) != int(len(news)) :
driver.execute_script("window.scrollTo(document.body.scrollHeight/2, document.body.scrollHeight);")
while (current_number_of_news == len(news)) :
news = driver.find_elements_by_class_name("news-text")
current_number_of_news = len(news)
print(len(news))
Upvotes: 4