user3919720
user3919720

Reputation: 11

how to index all metatags in nutch

I have installed Nutch 1.9 and configured it to successfully crawl with Solr 4.10.1. I am trying to set Nutch to index metadata as outlined here https://wiki.apache.org/nutch/IndexMetatags

How do I set it to index ALL of the metadata on a site? I set the value for metatags.names to * like this

<property>
    <name>metatags.names</name>
    <value>*</value>
    <description>Names of the metatags to extract, separated by ','. Use '*' to extract all metatags. Prefixes the names with 'metatag.' in the parse-metadata. For instance to index description and keywords, you need to activate the plugin index-metadata and set the
    value of the parameter 'index.parse.md' to 'metatag.description,metatag.keywords'.
    </description>
</property>

but I am unsure of how to set the value for index.parse.md without listing individual metatag names. I tried this

<property>
    <name>index.parse.md</name>
    <value>meta*</value>
    <description>Comma-separated list of keys to be taken from the parse metadata to generate fields. Can be used e.g. for 'description' or 'keywords' provided that these values are generated by a parser (see parse-metatags plugin)
    </description>
</property>

but that doesn't display any metadata when running

bin/nutch indexchecker http://nutch.apache.org/

and I am sure there is metadata on that site because it returns Parse Metadata when running

bin/nutch parsechecker http://nutch.apache.org/

Any help would be greatly appreciated! Thanks

Upvotes: 1

Views: 1372

Answers (1)

zhur
zhur

Reputation: 315

Plugin index-metadata doesn't work that way. You have to specify complete name there, e.g. "metatag.keywords".

Also "metatags.names" value "" is not really wildcard. You can't put something like "meta" there as well.

Upvotes: 0

Related Questions