Reputation: 10386
I have access to a web interface for a large amount of data. This data is usually accessed by people who only want a handful of items. The company that I work for wants me to download the whole set. Unfortunately, the interface only allows you to see fifty elements (of tens of thousands) at a time, and segregates the data into different folders.
Unfortunately, all of the data has the same url, which dynamically updates itself through ajax calls to an aspx interface. Writing a simple curl script to grab the data is difficult due to this and due to the authentication required.
How can I write a script that navigates around a page, triggers ajax requests, waits for the page to update, and then scrapes the data? Has this problem been solved before? Can anyone point me towards a toolkit?
Any language is fine, I have a good working knowledge of most web and scripting languages.
Thanks!
Upvotes: 0
Views: 87
Reputation: 38924
Maybe this ?
Website scraping using jquery and ajax
http://www.kelvinluck.com/2009/02/data-scraping-with-yql-and-jquery/
Upvotes: 1
Reputation: 55447
I usually just use a program like Fiddler or Live HTTP Headers and just watch what's happening behind the scenes. 99.9% of the time you'll see that there's a querystring or REST call with a very simple pattern that you can emulate.
Upvotes: 1
Reputation: 105059
Have you thought of using tools like WatiN which are actually used for UI testing purposes but I suppose you could use it to programmaticly make requests anywhere and act upon responses.
But since you can do whatever you please you can just make usual web requests from a desktop application and parse results. You could customize it to your own needs. And simulate AJax requests at will by setting certain request headers.
Upvotes: 1