Anand Verma
Anand Verma

Reputation: 51

web scrapers and harvesters

Web scrapers or harvesters are the software which fetches data from a website, I will be highly grateful, if anybody can suggest those various software packages available in the market.
They must be able to harvest dynamically (like AJAX) built websites.

Upvotes: 0

Views: 407

Answers (1)

Flavien Volken
Flavien Volken

Reputation: 21349

A web sucker is usually following hard links on a page (hrefs) to get the next page to follow. With ajax this is quite different. The content is sent to the client only on demand. As I do not know any web scraper with a really efficient way to specify parameters I would do my own tool for this. This would basically consist of forging my own requester and use it (plug-it) on the server's webservice. You can do this in different languages as long as that one supports the http get/post requests.

To investigate the way to forge the request:

  1. Install a webkit browser (I would suggest using Safari for some security policy reasons)
  2. Go on the public page communicating with the webservice which interests you
  3. Make a regular request
  4. Using the Web inspector of safari, look what happened into the Network tab when you did the request.
  5. In the headers you will know the Request URL (servicePath) as well as the method used. If it's a GET, then it's simple: You only have to change the parameters in the url to forge yours. If it's a POST you have then to look deeper into the data sent to then send some similar data.
  6. You could test posting on the server using a javascript. Here is my way of proceeding: On the website jQuerify the page to allow the java console to call jQuery methods. You can do this adding the bookmarklet jQuerify
  7. In the Web Inspector console (type esc to make it appear if hidden) try your forged post the following way (here is for a jSon post):

$.post("servicePath.php", {"your": "forgedRequest"},function(data){alert(data)})

Upvotes: 1

Related Questions