Dinesh Ahuja
Dinesh Ahuja

Reputation: 1025

Twitter scroll down of all posts using Selenium Python

I am using Selenium with Python. I am trying to scroll down of a twitter page. But it doesn't scroll down till the end of the page. It stops in the middle and twitter shows a message: "back to top". It doesn't even shows all the posts of last one month of a page. This is my page:

users = ['BBCWorld']

    username = browser.find_element_by_class_name("js-username-field")
    username.send_keys("username")
    password = browser.find_element_by_class_name("js-password-field")
    password.send_keys("password")

    signin_click = WebDriverWait(browser, 500000).until(
            EC.element_to_be_clickable((By.XPATH, '//*[@id="page-container"]/div/div[1]/form/div[2]/button'))
        )
    signin_click.click()

    for user in users:
        # User's profile
        browser.get('https://twitter.com/' + user)

        time.sleep(0.5)

        SCROLL_PAUSE_TIME = 0.5

        # Get scroll height
        last_height = browser.execute_script("return document.body.scrollHeight")

        while True:
            # Scroll down to bottom
            browser.execute_script("window.scrollTo(0, document.body.scrollHeight)")

            # Wait to load page
            time.sleep(SCROLL_PAUSE_TIME)


            # Calculate new scroll height and compare with last scroll height
            new_height = browser.execute_script("return document.body.scrollHeight")



        # Quit browser
        browser.quit()

Upvotes: 1

Views: 2678

Answers (1)

Andrei
Andrei

Reputation: 5647

You have forgot this:

while True:
    # Scroll down to bottom
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight)")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)


    # Calculate new scroll height and compare with last scroll height
    new_height = browser.execute_script("return document.body.scrollHeight")

    # break condition
    if new_height == last_height:
        break
    last_height = new_height

Also you have SCROLL_PAUSE_TIME = 0.5 it is not very much, and when number of posts to load becomes bigger, twitter slows down. You have to increase this pause. I would try SCROLL_PAUSE_TIME = 2

PS: it is not very effective to use hard coded pause. Instead you can try to locate spinner or whatever, when twitter loads new content and wait until spinner disappear. This would be more elegant.

Upvotes: 1

Related Questions