Dave
Dave

Reputation: 19

How to download webpage with all related content using phantomjs

I want to download an entire webpage along with content (including the CSS, JavaScript, images, external resources) that may be required to render the webpage in any browser using PhantomJS. I do not want to execute scripts but simply parse the CSS and JavaScript for more links to contents and download them too.

I tried using tools like wget (does exactly what I need but is very slow because it uses a single tcp connection to the webserver) and httrack (downloads entire websites, in my case I want to download only contents which are required to render the page by recursively parsing links in files). I am currently trying to use phantomjs for the purpose but could not find the right way to use it.

Upvotes: 0

Views: 2587

Answers (1)

Oleksii Dniprovskyi
Oleksii Dniprovskyi

Reputation: 135

Try to use this code:

var page = require('webpage').create();

var url = "your url goes here";
var fs = require('fs');
var path = 'index.html';//you might want to change format whether .json .txt etc.

page.open(url, function (status) {
    if(status !== 'success')
        console.log('Connection failed, page was not loaded!');
    else
        var content = page.content;
        fs.write(path, content ,'w')
        phantom.exit();
});

This must give you entire content of the webpage. If you need further assistance, please let me know!

Upvotes: 1

Related Questions