Reputation: 3671
I am interested in learning more about screen scraping in Mac OS X.
Basically, the reason I am thinking about it is that there is a comedian who has a faux twitter account that's pretty funny and the only way to read all of the past tweets is through an archived site that I found. The way the site is set up is that it is a main page and there is a link to every day for the past two years (just basic anchor tags running down the page). It's a real simple site and I figured if I just was able to scrape the data and put it into a file, I could read through it all in one place instead of clicked on hundreds of links.
I'm basically using this as an excuse to learn this method of coding.
I've Googled and can't seem to find much out. I understand the PHP code a bit (I'm decent at PHP) but wasn't sure if it's possible to scrape data from links on page. It seems relatively easy to scrape from a single screen.
My other question is how do you run the code? I've seen several programs for Windows and Linux but nothing for Mac OS X that I could use yet (I'm using OS 10.8).
Could someone point me in the right direction? Thanks!
Upvotes: 0
Views: 2242
Reputation: 7005
Consider this project an excuse to learn python. It's pretty quick to get up to speed with, and has lots of great packages to handle almost everything you can dream up, including this.
I bookmarked this a few weeks back:
It's a python webkit client -- you'll basically be able to pull whatever you want from the page with a few lines of code.
Upvotes: 1