Reputation: 41
use selenium.webdriver
to log in Facebook and to get the html page of a public figure, such as https://www.facebook.com/DonaldTrump/?fref=ts, may want to crawl the post content from this page.
I found that use selemium.webdriver
, only get the contents of the web page in the current screen , for example, when log in the facebook and want to get all the web content of https://www.facebook.com/DonaldTrump/?fref=ts, what I got is only the several post in the current screen, but in fact, the post(the content) in the page https://www.facebook.com/DonaldTrump/?fref=ts are so many.
I will roll the mouse wheel so many times, the page can reach its bottom, but now what I get is only the limited content in current screen could you please tell me the solution method, or tell me other methods or library except selenium that can log in facebbook and get all the content of the target page(not only the content in current screen)
The program that I wrote is:
import requests
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
FACEBOOK_URL_PREFIX = "https://www.facebook.com/"
def web_public_figure(self,p_figure_name):
#delete " " in p_figue_name
p_figure_name_arr = p_figure_name.split(" ")
p_figure_name_str = "".join(p_figure_name_arr)
params = r"/?fref=ts"
p_f_web_url = FACEBOOK_URL_PREFIX + p_figure_name_str + params
# log in the website
login_url = "https://www.facebook.com/login.php?login_attempt=1&lwv=110"
glovar.webdriver_browser = webdriver.Chrome()
glovar.webdriver_browser.get(login_url)
# user credentials
user = glovar.webdriver_browser.find_element_by_css_selector("#email")
user.send_keys('[email protected]')
password = glovar.webdriver_browser.find_element_by_css_selector("#pass")
password.send_keys('expectopatronum')
login = glovar.webdriver_browser.find_element_by_css_selector("#loginbutton")
login.click()
# the login maybe fail, return to the login page
if "login" in glovar.webdriver_browser.current_url:
glovar.webdriver_browser.close()
time.sleep(10)
glovar.webdriver_browser.get(p_f_web_url)
html_p_f_page = glovar.webdriver_browser.page_source
return html_p_f_page
p_figure_name
is "Donald trump"
, but the "html_p_page"
is only the part of the whole page:https://www.facebook.com/DonaldTrump/?fref=ts,(only the part in current screen).
It seems in the page, there is button "see all"
, could you please tell me how to get all the content of such a page, maybe using library other than selenium
Upvotes: 0
Views: 97
Reputation: 11
You can do it directly in Selenium. It's just a matter of programmatically scrolling the page down. The problem is called infinite scrolling and is extensively described in this answer.
Basically you just have to make a page scroll down by a value of its height, a few times. Something like this should do, but I recommend you read the whole linked post.
for i in range(1,100):
self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(4)
Upvotes: 1