João Mosmann
João Mosmann

Reputation: 2902

How to get page HTML before it get changed by Javascript in PhantomJS

I'm trying to get the page pure html, to make a diff after the scripts evaluations. But I'm finding nothing.

I checked in the Web Page Module API page. http://phantomjs.org/api/webpage/

But every event returns me the HTML after being affected by the page scripts, or an empty html structure.

Upvotes: 0

Views: 82

Answers (1)

Artjom B.
Artjom B.

Reputation: 61892

There is no API call for that, but you can easily download the original page source as a separate XHR:

var originalSource = page.evaluate(function(){
    var xhr = new XMLHttpRequest();
    xhr.open("GET", ""+window.location, false);
    xhr.send();
    return xhr.responseText;
});

Though, this will probably not work if the page source depends on the session. A tweaking of request headers might work then. See also Can I get the original page source (vs current DOM) with phantomjs/casperjs?

Upvotes: 1

Related Questions