Reputation: 1633
Which is the best integration for Apache Tika assuming that I already connected and used Nutch(2.2.1) + Solr (4.3)?
I understand that Tika can be integrated within Nutch and/or Solr, but which one is the best decision?
Upvotes: 0
Views: 3159
Reputation: 1491
Set up the Tika plugin with Nutch, Nutch will parse the data for you and will do all the hard work for you.
I would suggest setting it up on Solr as well, you may wish to send documents to Solr via the curl
command and it would help to have it set up on Solr too. It comes with little extra configuration and no performance costs:
There is a guide to setting up Tika & extracting request handler here
Upvotes: 1