Reputation: 15821
I want to periodically scrape my blog for links and archive the pages I link to, lest they be lost forever in the sands of time. What's the best way to save them such that when I later want to view them, I can see them as they would have appeared if I'd clicked the link when they'd still been up?
Many web browsers seem to have this functionality bound to Ctrl/Cmd-S. Is there a good way to do it programmatically?
Upvotes: 1
Views: 195
Reputation: 44821
you don't talk about technology stack so maybe anything goes.
It looks to me like phantomjs might be what you're looking for, it's a headless webkit so can scrape your site and write it out as a PDF.
http://code.google.com/p/phantomjs/wiki/QuickStart#Rendering
Upvotes: 1