crookedleaf
crookedleaf

Reputation: 2198

Selenium driver's page source different than browser

i unfortunately am not able to post code to reproduce this problem, since it involves signing into a site that is not a public site. but my question is more general than code problems. essentially, driver.page_source does not match what shows up in the browser it is driving. this is not an issue with elements not loading fully because i am testing this while executing code line by line in a python terminal. i am looking at the page source in the browser after right clicking and going to "view page source", and but if i print driver.page_source or attempt to find_element_by_[...], it shows slightly different code with entire elements missing. here is the html in question:

<nav role="navigation" class="utility-nav__wrapper--right">
<input id="hdn_partyId" value="1965629" type="hidden">
<input id="hdn_firstName" value="CHARLES" type="hidden">
<input id="hdn_sessionId" value="uHxQhlARvzA7N16uh+KJAdNFIcY6D8f9ornqoPQ" type="hidden">
<input id="hdn_cmsAlertRequest" type="hidden" value="Biennial Plus">
<ul class="h-list h-list--middle">
    [...]
</ul>

i need all 4 of the input elements, however, hdn_partyId and hdn_sessionId elements do not appear in selenium's .page_source and if i try to get them with .find_element_by_[...] i get a NoSuchElementException

i even ran a check on finding all input elements and listing them, and these 2 do not show up.

does anyone have any idea why selenium would not provide the same content as directly looking at the browser it is driving?

EDIT: to clarify... i am driving Chrome with Chromedriver through Selenium. this is not an issue with the page not fully loading. as i mentioned, i am running this manually line by line through a python terminal and not executing a script. so the browser pops up, loads the page, logs in, and then i manually check the browser's page source and see the element, then i print driver.page_source and it's not there, and if i run session_id = driver.find_element_by_id('hdn_sessionId') i get a NoSuchElementException. there are also no frames at all in the page, nor any additional windows.

Upvotes: 22

Views: 17016

Answers (4)

Steve Adams
Steve Adams

Reputation: 19

Quite often when using selenium, waiting does the trick without needing a lot of extra code (i.e. giving a few seconds for the full DOM to load). So in the example below, the HTML that was gathered reflected what one would see when one 'inspects' as opposed to using 'view source', which displayed pre-JS DOM

from time import sleep
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())

driver.get(url)
sleep(10)
HTML = driver.page_source

Upvotes: 0

yash shah
yash shah

Reputation: 69

try like this you will get source code keyword "view-source:" which can be different according to your browser this is for the chrome

driver.get("view-source:"+url)

sourcecode=driver.find_element_by_tag_name('body').text

Upvotes: 6

Ger Mc
Ger Mc

Reputation: 640

If you locate the 'body' of the page then use get_attribute('innerHTML') you can access everything from the page.

Upvotes: 0

crookedleaf
crookedleaf

Reputation: 2198

A coworker of mine has figured out the issue and a workaround. Essentially, after the page is done loading, it runs a javascript command that cleans up the DOM. What the "view page source" in the browser shows is not what the current state is. So running print driver.page_source or using any form of driver.find_element_by_[...] is pulling from the newest and freshest page data, while the browser's "view page source" only shows what was provided when the page first loaded. If you start 'inspecting' the page in Chrome, you will see the HTML is different than what the browser says the "page source" is. After reverse engineering the Javascript, we are able to run partyid = driver.execute_script('return accountdata.$partyId.val();') and get what was originally assigned. I hope this is enough info to help other people who may run into this issue in the future.

Upvotes: 17

Related Questions