Language Detection in Solr for Nutch documents

Question

How can I use Solr for language identification of documents obtained by crawling with nutch?

I installed Nutch 1.9 and Solr 4.8.1. I added a new core, named "core-test" to solr by means of Core Admin in the Solr Admin page and I followed the steps in Solr wiki for language detection during documents indexing.

I modified the schema.xml in core-test/conf by adding the field

Then, I used Nutch for crawling a set of web pages by

crawl seed.txt Test http://localhost:8983/solr/core-test 2

Nutch works appropriately but the language of the documents is not identified, i.e. I don't obtain the field language_s when I make a query in http://localhost:8983/solr/#/core-test/query with q set to ":".

Language Detection in Solr for Nutch documents

Answers (1)

Related Questions