LearningPython
LearningPython

Reputation: 21

Can't View Complete Page Source in Selenium

When I view the source HTML after manually navigating to the site via Chrome I can see the full page source but on loading the page source via selenium I'm not getting the complete page source.

from bs4 import BeautifulSoup
from selenium import webdriver
import sys,time


driver = webdriver.Chrome(executable_path=r"C:\Python27\Scripts\chromedriver.exe")
driver.get('http://www.magicbricks.com/')


driver.find_element_by_id("buyTab").click()

time.sleep(5)
driver.find_element_by_id("keyword").send_keys("Navi Mumbai")

time.sleep(5)
driver.find_element_by_id("btnPropertySearch").click()

time.sleep(30)

content = driver.page_source.encode('utf-8').strip()

soup = BeautifulSoup(content,"lxml")

print soup.prettify()

Upvotes: 2

Views: 8075

Answers (2)

ghchoi
ghchoi

Reputation: 5156

Try something like:

import time
time.sleep(5)
content = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")

instead of driver.page_source.

Dynamic web pages are often needed to be rendered by JavaScript.

Upvotes: 1

Leroy Jenkins
Leroy Jenkins

Reputation: 68

The website is possibly blocking or restricting the user agent for selenium. An easy test is to change the user agent and see if that does it. More info at this question:

Change user agent for selenium driver

Quoting:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
opts = Options()
opts.add_argument("user-agent=whatever you want")

driver = webdriver.Chrome(chrome_options=opts)

Upvotes: 1

Related Questions