Reputation: 19
I want to download an entire webpage along with content (including the CSS, JavaScript, images, external resources) that may be required to render the webpage in any browser using PhantomJS. I do not want to execute scripts but simply parse the CSS and JavaScript for more links to contents and download them too.
I tried using tools like wget (does exactly what I need but is very slow because it uses a single tcp connection to the webserver) and httrack (downloads entire websites, in my case I want to download only contents which are required to render the page by recursively parsing links in files). I am currently trying to use phantomjs for the purpose but could not find the right way to use it.
Upvotes: 0
Views: 2587
Reputation: 135
Try to use this code:
var page = require('webpage').create();
var url = "your url goes here";
var fs = require('fs');
var path = 'index.html';//you might want to change format whether .json .txt etc.
page.open(url, function (status) {
if(status !== 'success')
console.log('Connection failed, page was not loaded!');
else
var content = page.content;
fs.write(path, content ,'w')
phantom.exit();
});
This must give you entire content of the webpage. If you need further assistance, please let me know!
Upvotes: 1