Reputation: 21
When I view the source HTML after manually navigating to the site via Chrome I can see the full page source but on loading the page source via selenium I'm not getting the complete page source.
from bs4 import BeautifulSoup
from selenium import webdriver
import sys,time
driver = webdriver.Chrome(executable_path=r"C:\Python27\Scripts\chromedriver.exe")
driver.get('http://www.magicbricks.com/')
driver.find_element_by_id("buyTab").click()
time.sleep(5)
driver.find_element_by_id("keyword").send_keys("Navi Mumbai")
time.sleep(5)
driver.find_element_by_id("btnPropertySearch").click()
time.sleep(30)
content = driver.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content,"lxml")
print soup.prettify()
Upvotes: 2
Views: 8075
Reputation: 5156
Try something like:
import time
time.sleep(5)
content = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
instead of driver.page_source
.
Dynamic web pages are often needed to be rendered by JavaScript.
Upvotes: 1
Reputation: 68
The website is possibly blocking or restricting the user agent for selenium. An easy test is to change the user agent and see if that does it. More info at this question:
Change user agent for selenium driver
Quoting:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
opts = Options()
opts.add_argument("user-agent=whatever you want")
driver = webdriver.Chrome(chrome_options=opts)
Upvotes: 1