Reputation: 3179
I would like to implement a search engine which should crawl a set of web sites, extract specific information from the pages and create full-text index of that specific information.
It seems to me that Xapian could be a good choice for the search engine library.
What are the options for a crawler/parser to integrate with Xapian?
Would Solr be a better choice than Xapian to integrate with open source crawlers/parsers?
Upvotes: 2
Views: 1599
Reputation: 99750
Here's a little comparison between Xapian and Solr.
But if you want to build a crawler, take a look at Nutch. It's extensible with plugins, so you could write a plugin that analyzes the information that you're looking for.
Upvotes: 2