Dail
Dail

Reputation: 4608

Why does PhantomJS not return the correct source?

I did a very small test:

var page = require('webpage').create()
  , fs   = require('fs');

page.open("http://www.google.it/search?q=web+design", function(status){

    if (status === 'success')
    {
        page.render('google.png');
        fs.write("source.html", page.content, 'w'); 
    }

    phantom.exit(); 
})

As you can see I search "web design" on google.it

Now, looking the source.html I noticed differences between PhantomJS generated source code and the real (Element Inspector of Chrome) html.

In my source code a result has this code:

<li class="g">
   <h3 class="r"><a href="/url?q=http://www.html.it/web-design/&amp;sa=U&amp;ei=Z2LZUbSaBcGV7Abm54BI&amp;ved=0CCwQFjAB&amp;usg=AFQjCNGagkxLs36cXSzGjyhnBX7duCI6dA"><b>WebDesign</b> - Guide e approfondimenti per webdesigner - HTML.it</a></h3>
   <div class="s">
      <div class="kv" style="margin-bottom:2px"><cite>www.html.it/<b>web</b>-<b>design</b>/</cite><span class="flc"> - <a href="/url?q=http://webcache.googleusercontent.com/search%3Fq%3Dcache:3GWnT4NPDr0J:http://www.html.it/web-design/%252Bweb%2Bdesign%26hl%3Dit%26ct%3Dclnk&amp;sa=U&amp;ei=Z2LZUbSaBcGV7Abm54BI&amp;ved=0CC0QIDAB&amp;usg=AFQjCNE_1Gt5RL9WQAGZpM_3f-oxZ1VR9w">Copia cache</a></span></div>
      <span class="st">WebDesign: progettazione Web, User Experience, Architettura dell'informazione, <br>  i consigli di esperti designer in guide e articoli di approfondimento in italiano.</span><br>
   </div>
</li>

BUT the real source (read via Element Inspect of Chrome) is:

<li class="g">
   <!--m-->
   <div data-hveid="55" class="rc">
      <span style="float:left"></span>
      <h3 class="r"><a href="/url?sa=t&amp;rct=j&amp;q=&amp;esrc=s&amp;source=web&amp;cd=2&amp;cad=rja&amp;ved=0CDgQFjAB&amp;url=http%3A%2F%2Fwww.html.it%2Fweb-design%2F&amp;ei=wmTZUfHdOYSO7AagwIHwDw&amp;usg=AFQjCNFaDZWWczDbce8TlYh9oqYluJ-E5g&amp;bvm=bv.48705608,d.ZGU" onmousedown="return rwt(this,'','','','2','AFQjCNFaDZWWczDbce8TlYh9oqYluJ-E5g','','0CDgQFjAB','','',event)"><em>WebDesign</em> - Guide e approfondimenti per webdesigner - HTML.it</a></h3>
      <div class="s">
         <div>
            <div class="f kv" style="white-space:nowrap">
               <cite>www.html.it/<b>web</b>-<b>design</b>/</cite>‎
               <div class="action-menu ab_ctl">
                  <a href="#" data-ved="0CDkQ7B0wAQ" class="clickable-dropdown-arrow ab_button" id="am-b1" aria-label="Dettagli risultato" jsaction="ab.tdd; keydown:ab.hbke; keypress:ab.mskpe" role="button" aria-haspopup="true" aria-expanded="false"><span class="mn-dwn-arw"></span></a>
                  <div data-ved="0CDoQqR8wAQ" class="action-menu-panel ab_dropdown" jsaction="keydown:ab.hdke; mouseover:ab.hdhne; mouseout:ab.hdhue" role="menu" tabindex="-1">
                     <ul>
                        <li class="action-menu-item ab_dropdownitem" role="menuitem"><a href="http://webcache.googleusercontent.com/search?q=cache:3GWnT4NPDr0J:www.html.it/web-design/+&amp;cd=2&amp;hl=it&amp;ct=clnk&amp;gl=it&amp;client=ubuntu" onmousedown="return rwt(this,'','','','2','AFQjCNEaothLaL83HBobw4UE8q_OpkIPrw','','0CDsQIDAB','','',event)" class="fl">Copia&nbsp;cache</a></li>
                     </ul>
                  </div>
               </div>
            </div>
            <div class="f slp"></div>
            <span class="st"><em>WebDesign</em>: progettazione Web, User Experience, Architettura dell'informazione, i consigli di esperti designer in guide e articoli di approfondimento in italiano.</span>
         </div>
      </div>
   </div>
   <!--n-->
</li>

as you can see the last code is more complete.

So my question is:

Why those results have different code?

I read PhantomJS executes all the JS Inside the page as I browser does, so why those differences?

Thank you!

Upvotes: 2

Views: 638

Answers (2)

NiKo
NiKo

Reputation: 11411

Maybe try to wait for all the DOM transformations made by Google's js code to have been performed… for example, this can be achieved by waiting for the .action-menu element to be available (disclaimer: as casperjs author, I'm using casperjs here):

var fs = require('fs');

require('casper').create()
    .start("http://www.google.it/search?q=web+design")
    .waitForSelector(".action-menu", function() {
        this.capture('google.png');
        fs.write("source.html", this.getPageContent(), 'w'); 
    }).run();

Upvotes: 1

Vitaly Slobodin
Vitaly Slobodin

Reputation: 1359

Because PhantomJS has a different user agent. If you change the user agent to Google Chrome, you'll receive the same result as in Google Chrome.

You can change the user agent via page.settings.userAgent property.

Upvotes: 2

Related Questions