Gregorie Vermoesen
Gregorie Vermoesen

Reputation: 1

nutch V2 (with solr) deleting documents

So I'm using nutch V2 to index my website. But pages that no longer exist (I'm indexing a CMS, so pages can be removed) are not removed from the SOLR index.

I have tried to set: db.update.purge.404=true in my nutch-default.xml but that doesn't seem to do anything.

For nutch V1 I can see that the commandline parameter "-deleteGone" exists, but from the documentation I can only guess that it's removed in V2.

So my question is: how do I configure nutch V2 to remove non existent urls?

Upvotes: 0

Views: 250

Answers (1)

Nicomedes E.
Nicomedes E.

Reputation: 1334

You have to set db.update.purge.404=true in nutch-site.xml, not in nutch-default.xml

Upvotes: 2

Related Questions