Aakash Kag
Aakash Kag

Reputation: 362

nutch not indexing specifig teg in solr

i am using extractor plug-in. https://github.com/BayanGroup/nutch-custom-search I follow mentioned step on github. here is my configuration: 1) extractors.xml title" />

2) nutch-site.xml
<property>
  <name>plugin.includes</name>
  <value>protocol-http|urlfilter-regex|parse-(text|html|metatags|msexcel|msword|mspowerpoint|pdf)|extractor|scoring-opic|index-(basic|anchor|more|metadata)|query-(basic|site|url|lang)|urlnormalizer-(pass|regex|basic)</value>
</property>
3)  added field in schema.xml of solr and nutch   <field name="aakashtitle" type="string" stored="true" indexed="true" multiValued="true"/>
4)I added plugin in parse-plugins.xml
I am not getting any error but my data is not indexing in solr??
please help . and thanks!

Upvotes: 1

Views: 247

Answers (1)

Jorge Luis
Jorge Luis

Reputation: 3253

I took a quick look to the GH repository, since the code actually works like a normal ParseFilter you should be able to check if the data is correctly being pulled by using the parsechecker command:

$ bin/nutch parsechecker <URL>

This should output the usual data extracted by Nutch (contentType, signature, url) and the ParseData (status, title, outlinks, etc.) and also any additional info extracted from the plugin.

You could also use the indexchecker command:

$ bin/nutch indexchecker <URL>

This will output the actual fields that are going to be indexed by the active indexing plugin (Solr/ES).

Upvotes: 2

Related Questions