user11390354
user11390354

Reputation:

Why does HTML source from Selenium look different than that shown in a web browser’s view?

I am using Python and Selenium to capture the HTML source of a webpage so I can parse it to find a particular element. The source, however, is not the same as what I get when using the “inspect element” view of a browser. The element I am looking for is not in the source Selenium provides. Is there any way to get the same source using Selenium, or using another tool or method?

Upvotes: 0

Views: 1534

Answers (2)

S A
S A

Reputation: 1910

As it is described in the selenium Documentation:

getPageSource
java.lang.String getPageSource()

Get the source of the last loaded page. If the page has been modified after loading (for example, by Javascript) there is no guarantee that the returned text is that of the modified page. Please consult the documentation of the particular driver being used to determine whether the returned text reflects the current state of the page or the text last sent by the web server. The page source returned is a representation of the underlying DOM: do not expect it to be formatted or escaped in the same way as the response sent from the web server. Think of it as an artist's impression.

Returns: The source of the current page

Upvotes: 1

Jansindl3r
Jansindl3r

Reputation: 399

You will have to download driver of a web browser that generates this dynamic content. Probably here http://chromedriver.chromium.org/downloads

from http://chromedriver.chromium.org/getting-started

import time
from selenium import webdriver

driver = webdriver.Chrome('/path/to/chromedriver')  # Optional argument, if not specified will search path.
driver.get('http://www.google.com/xhtml');
time.sleep(5) # Let the user actually see something!
search_box = driver.find_element_by_name('q')
search_box.send_keys('ChromeDriver')
search_box.submit()
time.sleep(5) # Let the user actually see something!
driver.quit()

this will popup a Chrome window, work and get the content. Don't forget to close it after and set time.sleep() so the driver has some time to generate the content. You can also run it headless, then you run it in a virtual window and can set f.e. window width and height 4000px, normal mode doesnt allow that

Upvotes: 0

Related Questions