Satheesh
Satheesh

Reputation: 656

Common output format of Web Crawler

Need to interface an existing application to social media monitoring. What is the common output format of a crawler. Will it be in XML,JSON? Or it does it varies based on the crawler eg: Python, Java?

Upvotes: 0

Views: 898

Answers (1)

Stewart McKee
Stewart McKee

Reputation: 106

It will vary, plus, you probably don't want one output 'file' as the site could be huge.

I've written a crawler in ruby called cobweb (http://github.com/stewartmckee/cobweb) that uses a hash for its data model. As each page is received you are presented with the hash to perform whatever actions on you wish.

Out of interest, what information are you expecting out of the crawl? Was just thinking a relatively simple addition would be to create a web api for cobweb, would that be something you could use?

Upvotes: 1

Related Questions