Reputation: 2198
i unfortunately am not able to post code to reproduce this problem, since it involves signing into a site that is not a public site. but my question is more general than code problems. essentially, driver.page_source
does not match what shows up in the browser it is driving. this is not an issue with elements not loading fully because i am testing this while executing code line by line in a python terminal. i am looking at the page source in the browser after right clicking and going to "view page source", and but if i print driver.page_source
or attempt to find_element_by_[...]
, it shows slightly different code with entire elements missing. here is the html in question:
<nav role="navigation" class="utility-nav__wrapper--right">
<input id="hdn_partyId" value="1965629" type="hidden">
<input id="hdn_firstName" value="CHARLES" type="hidden">
<input id="hdn_sessionId" value="uHxQhlARvzA7N16uh+KJAdNFIcY6D8f9ornqoPQ" type="hidden">
<input id="hdn_cmsAlertRequest" type="hidden" value="Biennial Plus">
<ul class="h-list h-list--middle">
[...]
</ul>
i need all 4 of the input elements, however, hdn_partyId
and hdn_sessionId
elements do not appear in selenium's .page_source
and if i try to get them with .find_element_by_[...]
i get a NoSuchElementException
i even ran a check on finding all input
elements and listing them, and these 2 do not show up.
does anyone have any idea why selenium would not provide the same content as directly looking at the browser it is driving?
EDIT: to clarify... i am driving Chrome with Chromedriver through Selenium. this is not an issue with the page not fully loading. as i mentioned, i am running this manually line by line through a python terminal and not executing a script. so the browser pops up, loads the page, logs in, and then i manually check the browser's page source and see the element, then i print driver.page_source
and it's not there, and if i run session_id = driver.find_element_by_id('hdn_sessionId')
i get a NoSuchElementException
. there are also no frames at all in the page, nor any additional windows.
Upvotes: 22
Views: 17016
Reputation: 19
Quite often when using selenium, waiting does the trick without needing a lot of extra code (i.e. giving a few seconds for the full DOM to load). So in the example below, the HTML that was gathered reflected what one would see when one 'inspects' as opposed to using 'view source', which displayed pre-JS DOM
from time import sleep
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(url)
sleep(10)
HTML = driver.page_source
Upvotes: 0
Reputation: 69
try like this you will get source code keyword "view-source:" which can be different according to your browser this is for the chrome
driver.get("view-source:"+url)
sourcecode=driver.find_element_by_tag_name('body').text
Upvotes: 6
Reputation: 640
If you locate the 'body'
of the page then use get_attribute('innerHTML')
you can access everything from the page.
Upvotes: 0
Reputation: 2198
A coworker of mine has figured out the issue and a workaround. Essentially, after the page is done loading, it runs a javascript command that cleans up the DOM. What the "view page source" in the browser shows is not what the current state is. So running print driver.page_source
or using any form of driver.find_element_by_[...]
is pulling from the newest and freshest page data, while the browser's "view page source" only shows what was provided when the page first loaded. If you start 'inspecting' the page in Chrome, you will see the HTML is different than what the browser says the "page source" is. After reverse engineering the Javascript, we are able to run partyid = driver.execute_script('return accountdata.$partyId.val();')
and get what was originally assigned. I hope this is enough info to help other people who may run into this issue in the future.
Upvotes: 17