Reputation: 1
So I'm using nutch V2 to index my website. But pages that no longer exist (I'm indexing a CMS, so pages can be removed) are not removed from the SOLR index.
I have tried to set: db.update.purge.404=true
in my nutch-default.xml
but that doesn't seem to do anything.
For nutch V1 I can see that the commandline parameter "-deleteGone" exists, but from the documentation I can only guess that it's removed in V2.
So my question is: how do I configure nutch V2 to remove non existent urls?
Upvotes: 0
Views: 250
Reputation: 1334
You have to set db.update.purge.404=true
in nutch-site.xml
, not in nutch-default.xml
Upvotes: 2