So8res
So8res

Reputation: 10386

Downloading a lot of slippery data

I have access to a web interface for a large amount of data. This data is usually accessed by people who only want a handful of items. The company that I work for wants me to download the whole set. Unfortunately, the interface only allows you to see fifty elements (of tens of thousands) at a time, and segregates the data into different folders.

Unfortunately, all of the data has the same url, which dynamically updates itself through ajax calls to an aspx interface. Writing a simple curl script to grab the data is difficult due to this and due to the authentication required.

How can I write a script that navigates around a page, triggers ajax requests, waits for the page to update, and then scrapes the data? Has this problem been solved before? Can anyone point me towards a toolkit?

Any language is fine, I have a good working knowledge of most web and scripting languages.

Thanks!

Upvotes: 0

Views: 87

Answers (3)

Chris Haas
Chris Haas

Reputation: 55447

I usually just use a program like Fiddler or Live HTTP Headers and just watch what's happening behind the scenes. 99.9% of the time you'll see that there's a querystring or REST call with a very simple pattern that you can emulate.

Upvotes: 1

Robert Koritnik
Robert Koritnik

Reputation: 105059

If you need to directly control a browser

Have you thought of using tools like WatiN which are actually used for UI testing purposes but I suppose you could use it to programmaticly make requests anywhere and act upon responses.

If you just need to get the data

But since you can do whatever you please you can just make usual web requests from a desktop application and parse results. You could customize it to your own needs. And simulate AJax requests at will by setting certain request headers.

Upvotes: 1

Related Questions