MegaMind_2
MegaMind_2

Reputation: 111

Running loop synchronously in python

I have a block of code, that crawls an infinite height website (Like FACEBOOK).

The Python selenium script asks the page javascript to go to the bottom of the page in order to load page further down. But eventually it happens that the loop runs asynchronously and the website's rate-limiter blocks the script.

I need the page to wait for the page to load first and then continue, but i failed in doing that.

The following things are what i have tried till now.

The code goes as follows :

while int(number_of_news) != int(len(news)) :
    driver.execute_script("window.scrollTo(document.body.scrollHeight/2, document.body.scrollHeight);")
    news = driver.find_elements_by_class_name("news-text")
    print(len(news))

The output is something like

enter image description here

Which i interpreted as the loop being executed multiple times when the value is 43, 63... and so on.

I also tried making it recursive, but the result is still the same. The recursive code is as follows :

def call_news(_driver, _news, _number_of_news):
    _driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    _news = driver.find_elements_by_class_name("news-text")
    print(len(_news))
    if int(len(_news)) != int(number_of_news) :
        call_news(_driver, _news, _number_of_news)
    else :
        return _news

Any kind of tip is appreciated.

Upvotes: 4

Views: 2123

Answers (1)

Guy
Guy

Reputation: 50909

You can set the page_load_timeout to make the driver wait for the page to load

driver.set_page_load_timeout(10)

Another option is to wait for the number of elements to change

current_number_of_news = 0
news = []
while int(number_of_news) != int(len(news)) :
    driver.execute_script("window.scrollTo(document.body.scrollHeight/2, document.body.scrollHeight);")
    while (current_number_of_news == len(news)) :
        news = driver.find_elements_by_class_name("news-text")
    current_number_of_news = len(news)
    print(len(news))

Upvotes: 4

Related Questions