Reputation: 91193
I have this webpage that uses client-side JavaScript to format data on the page before it's displayed to the user.
Is it possible to somehow use wget
to download the page and use some sort of client-side JavaScript engine to format the data as it would be displayed in a browser?
Upvotes: 38
Views: 73629
Reputation: 4741
If you just need the links, you can parse them out of elinks' -dump
feature.
This also gives an ASCII rendition of the document, but it does not give processed HTML as far as I can see.
A filter which works for me is something like:
sed -ne 's!^.*[0-9].*\(https://.*\)$!\1!p'
Upvotes: 0
Reputation:
Here is a simple little phantomjs script that triggers javascript on a webpage and allows you to pull it down locally:
file: get.js
var page = require('webpage').create(),
system = require('system'), address;
address = system.args[1];
page.scrollPosition= { top: 4000, left: 0}
page.open(address, function(status) {
if (status !== 'success') {
console.log('** Error loading url.');
} else {
console.log(page.content);
}
phantom.exit();
});
Use it as follows:
$> phantomjs /path/to/get.js "http://www.google.com" > "google.html"
Changing /path/to
, url
and filename
to what you want.
Upvotes: 9
Reputation: 187024
You could probably make that happen with something like PhantomJS
You can write a phantomjs script that will load the page like a browser would, and then either take screenshots or use JS to inspect the page and pull out data.
Upvotes: 28
Reputation: 2320
Not with wget, as I doubt it includes any form of a JavaScript engine. However, you could use WebKit to process the page, and thus the output.
Using things like this as a base for how to get the content: http://situated.wordpress.com/2008/06/04/take-screenshots-of-a-website-from-the-command-line/
Upvotes: 2