user3279550
user3279550

Reputation: 108

What are the benefits of applying Apache Tika to Solr instead of Nutch

I am trying to crawl data with Apache Nutch and index it with Apache Solr.

As part of this I want to parse the content as well. I am trying to figure out is it better to apply Tika to Nutch , to Solr or both.

Upvotes: 0

Views: 184

Answers (1)

Alexandre Rafalovitch
Alexandre Rafalovitch

Reputation: 9789

Apply it as early as you can but make sure to keep the original, full-fidelity, document somewhere as well.

There is no point passing a binary file around if you know that in the end you are going to reduce it to a set of metadata fields and get rid of the rest.

Upvotes: 2

Related Questions