Salman Haseeb Sheikh
Salman Haseeb Sheikh

Reputation: 1162

how to scrape websites with infinite scrolling with load more button using python and selenium

I want to scrape facebook's mbasic.facebook.com interface. It has load more button to scroll down to new posts. I have been doing much of research on facebook's regular interface scraping and found this Scraping infinite scrolling website with Selenium in Python

import unittest, time, re

class Sel(unittest.TestCase):
    def setUp(self):
        self.driver = webdriver.Chrome()
        self.driver.implicitly_wait(30)
        self.verificationErrors = []
        self.accept_next_alert = True
    def test_sel(self):
        driver = self.driver
        delay = 3
        driver.get("https://www.facebook.com")
        elem = driver.find_element_by_name("email")
        elem.clear()
        elem.send_keys("")

        elem2 = driver.find_element_by_name("pass")
        elem2.clear()
        elem2.send_keys("")
        elem2.send_keys(Keys.RETURN)
        for i in range(1,100):
            self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            time.sleep(4)
        html_source = driver.page_source
        data = html_source.encode('utf-8')
        print(data)


if __name__ == "__main__":
    unittest.main()

But I don't want to make a loop, rather I would want to trigger an event like, If user manually presses the load more posts button, the new page is loaded and I get page source of the page. Is there any way to do that? Any help would be appreciated.

Upvotes: 0

Views: 2447

Answers (1)

Mangohero1
Mangohero1

Reputation: 1912

So are you trying to get the page source each time you load more posts? Because that code doesn't reflect that. Assuming you want the source code each time the new list of posts loads, you can locate and click the "More Posts" button using an XPath.

for i in range(1, 10):
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
    driver.find_element_by_xpath('//span[contains(., "More")]/..').click()
    html_source = driver.page_source
    data = html_source.encode('utf-8')
    print(data)
    sleep(4)

Upvotes: 2

Related Questions