Jake Wilson
Jake Wilson

Reputation: 91193

wget + JavaScript?

I have this webpage that uses client-side JavaScript to format data on the page before it's displayed to the user.

Is it possible to somehow use wget to download the page and use some sort of client-side JavaScript engine to format the data as it would be displayed in a browser?

Upvotes: 38

Views: 73629

Answers (4)

sh1
sh1

Reputation: 4741

If you just need the links, you can parse them out of elinks' -dump feature.

This also gives an ASCII rendition of the document, but it does not give processed HTML as far as I can see.

A filter which works for me is something like:

sed -ne 's!^.*[0-9].*\(https://.*\)$!\1!p'

Upvotes: 0

user4401178
user4401178

Reputation:

Here is a simple little phantomjs script that triggers javascript on a webpage and allows you to pull it down locally:

file: get.js

var page = require('webpage').create(),
  system = require('system'), address;

address = system.args[1];
page.scrollPosition= { top: 4000, left: 0}  
page.open(address, function(status) {
  if (status !== 'success') {
    console.log('** Error loading url.');
  } else {
    console.log(page.content);
  }
  phantom.exit();
});

Use it as follows:
$> phantomjs /path/to/get.js "http://www.google.com" > "google.html"

Changing /path/to, url and filename to what you want.

Upvotes: 9

Alex Wayne
Alex Wayne

Reputation: 187024

You could probably make that happen with something like PhantomJS

You can write a phantomjs script that will load the page like a browser would, and then either take screenshots or use JS to inspect the page and pull out data.

Upvotes: 28

drowe
drowe

Reputation: 2320

Not with wget, as I doubt it includes any form of a JavaScript engine. However, you could use WebKit to process the page, and thus the output.

Using things like this as a base for how to get the content: http://situated.wordpress.com/2008/06/04/take-screenshots-of-a-website-from-the-command-line/

Upvotes: 2

Related Questions