Reputation: 1162
I want to scrape facebook's mbasic.facebook.com interface. It has load more button to scroll down to new posts. I have been doing much of research on facebook's regular interface scraping and found this Scraping infinite scrolling website with Selenium in Python
import unittest, time, re
class Sel(unittest.TestCase):
def setUp(self):
self.driver = webdriver.Chrome()
self.driver.implicitly_wait(30)
self.verificationErrors = []
self.accept_next_alert = True
def test_sel(self):
driver = self.driver
delay = 3
driver.get("https://www.facebook.com")
elem = driver.find_element_by_name("email")
elem.clear()
elem.send_keys("")
elem2 = driver.find_element_by_name("pass")
elem2.clear()
elem2.send_keys("")
elem2.send_keys(Keys.RETURN)
for i in range(1,100):
self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(4)
html_source = driver.page_source
data = html_source.encode('utf-8')
print(data)
if __name__ == "__main__":
unittest.main()
But I don't want to make a loop, rather I would want to trigger an event like, If user manually presses the load more posts button, the new page is loaded and I get page source of the page. Is there any way to do that? Any help would be appreciated.
Upvotes: 0
Views: 2447
Reputation: 1912
So are you trying to get the page source each time you load more posts? Because that code doesn't reflect that. Assuming you want the source code each time the new list of posts loads, you can locate and click the "More Posts" button using an XPath.
for i in range(1, 10):
driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
driver.find_element_by_xpath('//span[contains(., "More")]/..').click()
html_source = driver.page_source
data = html_source.encode('utf-8')
print(data)
sleep(4)
Upvotes: 2