nutch not indexing specifig teg in solr

Question

i am using extractor plug-in. https://github.com/BayanGroup/nutch-custom-search I follow mentioned step on github. here is my configuration: 1) extractors.xml title" />

2) nutch-site.xml

  plugin.includes
  protocol-http|urlfilter-regex|parse-(text|html|metatags|msexcel|msword|mspowerpoint|pdf)|extractor|scoring-opic|index-(basic|anchor|more|metadata)|query-(basic|site|url|lang)|urlnormalizer-(pass|regex|basic)

3)  added field in schema.xml of solr and nutch   
4)I added plugin in parse-plugins.xml
I am not getting any error but my data is not indexing in solr??
please help . and thanks!

Jorge Luis · Accepted Answer

I took a quick look to the GH repository, since the code actually works like a normal ParseFilter you should be able to check if the data is correctly being pulled by using the parsechecker command:

$ bin/nutch parsechecker

This should output the usual data extracted by Nutch (contentType, signature, url) and the ParseData (status, title, outlinks, etc.) and also any additional info extracted from the plugin.

You could also use the indexchecker command:

$ bin/nutch indexchecker

This will output the actual fields that are going to be indexed by the active indexing plugin (Solr/ES).

nutch not indexing specifig teg in solr

Answers (1)

Related Questions