Reputation: 4608
I did a very small test:
var page = require('webpage').create()
, fs = require('fs');
page.open("http://www.google.it/search?q=web+design", function(status){
if (status === 'success')
{
page.render('google.png');
fs.write("source.html", page.content, 'w');
}
phantom.exit();
})
As you can see I search "web design" on google.it
Now, looking the source.html I noticed differences between PhantomJS generated source code and the real (Element Inspector of Chrome) html.
In my source code a result has this code:
<li class="g">
<h3 class="r"><a href="/url?q=http://www.html.it/web-design/&sa=U&ei=Z2LZUbSaBcGV7Abm54BI&ved=0CCwQFjAB&usg=AFQjCNGagkxLs36cXSzGjyhnBX7duCI6dA"><b>WebDesign</b> - Guide e approfondimenti per webdesigner - HTML.it</a></h3>
<div class="s">
<div class="kv" style="margin-bottom:2px"><cite>www.html.it/<b>web</b>-<b>design</b>/</cite><span class="flc"> - <a href="/url?q=http://webcache.googleusercontent.com/search%3Fq%3Dcache:3GWnT4NPDr0J:http://www.html.it/web-design/%252Bweb%2Bdesign%26hl%3Dit%26ct%3Dclnk&sa=U&ei=Z2LZUbSaBcGV7Abm54BI&ved=0CC0QIDAB&usg=AFQjCNE_1Gt5RL9WQAGZpM_3f-oxZ1VR9w">Copia cache</a></span></div>
<span class="st">WebDesign: progettazione Web, User Experience, Architettura dell'informazione, <br> i consigli di esperti designer in guide e articoli di approfondimento in italiano.</span><br>
</div>
</li>
BUT the real source (read via Element Inspect of Chrome) is:
<li class="g">
<!--m-->
<div data-hveid="55" class="rc">
<span style="float:left"></span>
<h3 class="r"><a href="/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&ved=0CDgQFjAB&url=http%3A%2F%2Fwww.html.it%2Fweb-design%2F&ei=wmTZUfHdOYSO7AagwIHwDw&usg=AFQjCNFaDZWWczDbce8TlYh9oqYluJ-E5g&bvm=bv.48705608,d.ZGU" onmousedown="return rwt(this,'','','','2','AFQjCNFaDZWWczDbce8TlYh9oqYluJ-E5g','','0CDgQFjAB','','',event)"><em>WebDesign</em> - Guide e approfondimenti per webdesigner - HTML.it</a></h3>
<div class="s">
<div>
<div class="f kv" style="white-space:nowrap">
<cite>www.html.it/<b>web</b>-<b>design</b>/</cite>
<div class="action-menu ab_ctl">
<a href="#" data-ved="0CDkQ7B0wAQ" class="clickable-dropdown-arrow ab_button" id="am-b1" aria-label="Dettagli risultato" jsaction="ab.tdd; keydown:ab.hbke; keypress:ab.mskpe" role="button" aria-haspopup="true" aria-expanded="false"><span class="mn-dwn-arw"></span></a>
<div data-ved="0CDoQqR8wAQ" class="action-menu-panel ab_dropdown" jsaction="keydown:ab.hdke; mouseover:ab.hdhne; mouseout:ab.hdhue" role="menu" tabindex="-1">
<ul>
<li class="action-menu-item ab_dropdownitem" role="menuitem"><a href="http://webcache.googleusercontent.com/search?q=cache:3GWnT4NPDr0J:www.html.it/web-design/+&cd=2&hl=it&ct=clnk&gl=it&client=ubuntu" onmousedown="return rwt(this,'','','','2','AFQjCNEaothLaL83HBobw4UE8q_OpkIPrw','','0CDsQIDAB','','',event)" class="fl">Copia cache</a></li>
</ul>
</div>
</div>
</div>
<div class="f slp"></div>
<span class="st"><em>WebDesign</em>: progettazione Web, User Experience, Architettura dell'informazione, i consigli di esperti designer in guide e articoli di approfondimento in italiano.</span>
</div>
</div>
</div>
<!--n-->
</li>
as you can see the last code is more complete.
So my question is:
Why those results have different code?
I read PhantomJS executes all the JS Inside the page as I browser does, so why those differences?
Thank you!
Upvotes: 2
Views: 638
Reputation: 11411
Maybe try to wait for all the DOM transformations made by Google's js code to have been performed… for example, this can be achieved by waiting for the .action-menu
element to be available (disclaimer: as casperjs author, I'm using casperjs here):
var fs = require('fs');
require('casper').create()
.start("http://www.google.it/search?q=web+design")
.waitForSelector(".action-menu", function() {
this.capture('google.png');
fs.write("source.html", this.getPageContent(), 'w');
}).run();
Upvotes: 1
Reputation: 1359
Because PhantomJS has a different user agent. If you change the user agent to Google Chrome, you'll receive the same result as in Google Chrome.
You can change the user agent via page.settings.userAgent
property.
Upvotes: 2