Roi
Roi

Reputation: 61

Selenium driver.page_source() extracts only partial HTML DOM

I have a webpage that when I right click on it then view page source I get: SECTION-A

But when I click on it then Inspect I get much longer output, I tried to get the page source using JS but same problem and I'm getting the output in SECTION-A... How can I fix this?

Note: I'm looking for universal solution and not only for this specific website.

What I tried:

time.sleep(3)
html1 = driver.execute_script("return document.documentElement.outerHTML")
html2 = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
html3 = driver.page_source()

I'm using chrome, is there any flag or solution for this problem?


SECTION-A:

<head><script language="javascript" type="text/javascript">
var framePara = new Array(
0,
"main.htm",
1,
0,0 );
</script>
<script language="javascript" type="text/javascript">
var indexPara = new Array(
"192.168.0.1",
1742822853,
"tplinklogin.net",
0,0 );
</script>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<title>TL-WR845N</title>
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="wed, 26 Feb 1997 08:21:57 GMT">
<link href="../dynaform/css_main.css" rel="stylesheet" type="text/css">
<script language="javascript" src="../dynaform/common.js" type="text/javascript"></script>
<script language="javascript" type="text/javascript"><!--
//--></script>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<script language="javascript" src="../localiztion/char_set.js" type="text/javascript">
</script><script type="text/javascript">
var startUrl = "";
var startHelpUrl = "";
if(framePara[0] == 1)
{
    startUrl = "../userRpm/WzdStartRpm.htm";
    startHelpUrl = "../help/WzdStartHelpRpm.htm";
}
else
{
    startUrl = "../userRpm/StatusRpm.htm";
    /*changed by ZQQ, 2015.7.25, corresponding to function StatusRpmHtm*/
    if (framePara[2] == 0x08 || framePara[2] == 0x07 || framePara[2] == 0x06 || framePara[2] == 0x03)
    {
        startHelpUrl = "../help/StatusHelpRpm_AP.htm";
    }
    else if (framePara[2] == 0x04)
    {
        startHelpUrl = "../help/StatusHelpRpm_APC.htm";
    }
    else
    {
        startHelpUrl = "../help/StatusHelpRpm.htm";
    }
}
document.write("<FRAMESET rows=90,*>");
document.write("<FRAME name=topFrame marginWidth=0 marginHeight=0 src=\"../frames/top.htm\" noResize scrolling=no frameSpacing=0 frameBorder=0 id=\"topFrame\">");
document.write("<FRAMESET cols=182,55%,*>");
document.write("<FRAME name=bottomLeftFrame marginWidth=0 marginHeight=0 src=\"../userRpm/MenuRpm.htm\" noResize frameBorder=1 scrolling=auto style=\"overflow-x:hidden\" id=\"bottomLeftFrame\">");
document.write("<FRAME name=mainFrame marginWidth=0 marginHeight=0 src=" +startUrl+" frameBorder=1 id=\"mainFrame\">");
document.write("<FRAME name=helpFrame marginWidth=0 marginHeight=0 src="+startHelpUrl+" frameBorder=1 id=\"helpFrame\">");
document.write("</FRAMESET>");
</script></head>

        
    
<frameset rows="90,*"><frame name="topFrame" marginwidth="0" marginheight="0" src="../frames/top.htm" noresize="" scrolling="no" framespacing="0" frameborder="0" id="topFrame"><frameset cols="182,55%,*"><frame name="bottomLeftFrame" marginwidth="0" marginheight="0" src="../userRpm/MenuRpm.htm" noresize="" frameborder="1" scrolling="auto" style="overflow-x:hidden" id="bottomLeftFrame"><frame name="mainFrame" marginwidth="0" marginheight="0" src="../userRpm/StatusRpm.htm" frameborder="1" id="mainFrame"><frame name="helpFrame" marginwidth="0" marginheight="0" src="../help/StatusHelpRpm.htm" frameborder="1" id="helpFrame"></frameset>

<noframes>
    <body id="t_noFrame">Please upgrade to a version 4 or higher browser so that you can use this setup tool.</body>
</noframes>


</frameset>

Upvotes: 4

Views: 1908

Answers (1)

undetected Selenium
undetected Selenium

Reputation: 193258

There can be substantial difference in the WebElements as shown through View Source and as shown through Inspector tool. Both the methods are two different browser features which allows us to look into the DOM Tree. However the core difference between them is:

  • View Source shows the HTML that was delivered from the AUT (Application under Test) to the browser.
  • Inspect element is a Developer Tool e.g. Chrome DevTools to look at the state of the HTML DOM after the browser has applied its error correction and after any Javascript have manipulated the DOM. In short, using View Source you will observe the Javascript but not the HTML. The HTML errors may get corrected in the Inspect Elements tool.

Hence you see a larger output using Inspect.

You can find a relevant detailed discussion in Get web elements as they shown through view source


Solution

page_source is one of the most effective and proven approach using Selenium to extract the page source. However, there is a catch. You need to induce WebDriverWait for the visibility_of_element_located() of a static element within the webpage. As an example, to extract the page_source of the webpage https://example.com you can induce WebDriverWait for <h1> tag with innerText as Example Domain to be visible as follows:

  • Using XPATH:

    driver.get("https://example.com")     
    WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[text()='Example Domain']")))
    print(driver.page_source())
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Upvotes: 1

Related Questions