Reputation: 362
i am using extractor plug-in. https://github.com/BayanGroup/nutch-custom-search I follow mentioned step on github. here is my configuration: 1) extractors.xml title" />
2) nutch-site.xml
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(text|html|metatags|msexcel|msword|mspowerpoint|pdf)|extractor|scoring-opic|index-(basic|anchor|more|metadata)|query-(basic|site|url|lang)|urlnormalizer-(pass|regex|basic)</value>
</property>
3) added field in schema.xml of solr and nutch <field name="aakashtitle" type="string" stored="true" indexed="true" multiValued="true"/>
4)I added plugin in parse-plugins.xml
I am not getting any error but my data is not indexing in solr??
please help . and thanks!
Upvotes: 1
Views: 247
Reputation: 3253
I took a quick look to the GH repository, since the code actually works like a normal ParseFilter
you should be able to check if the data is correctly being pulled by using the parsechecker
command:
$ bin/nutch parsechecker <URL>
This should output the usual data extracted by Nutch (contentType, signature, url) and the ParseData
(status, title, outlinks, etc.) and also any additional info extracted from the plugin.
You could also use the indexchecker
command:
$ bin/nutch indexchecker <URL>
This will output the actual fields that are going to be indexed by the active indexing plugin (Solr/ES).
Upvotes: 2