Reputation: 261
In nutch-site.xml
, under plugin-includes
header, when I write parse-(type1|type2)
, what does it mean?
Does this mean for each url being fetched by nutch, nutch parses the content first by using type
1 parser and then sequentially invokes the type2
parser?
Upvotes: 1
Views: 600
Reputation: 6547
Your assumption is correct. This is how it works. But keep in mind that each plugin can be assigned a certain content type, or a set of content types. For example the parse-pdf plugin will not parse msword documents.
Upvotes: 1