Jevgenij L.
Jevgenij L.

Reputation: 13

Nutch fetches already fetched URLs

I am trying to crawl website using Nutch. I use commands:

I noticed what Nutch fetches already fetched URLs on each loop iteration.

Config I have made:

Added config to nutch-site.xml:

I use commands:

I have tried versions of Nutch 2.2.1 with MySQL and 2.3 with MongoDB. Result is same already fetched URLs are re-feched on each crawl loop iteration.

What I should to do to fetch all not crawled URLs?

Upvotes: 0

Views: 442

Answers (1)

Donatas
Donatas

Reputation: 11

This is an open issue for Nutch 2.X. I faced it this weekend too.

The fix is scheduled for release 2.3.1: https://issues.apache.org/jira/browse/NUTCH-1922.

Upvotes: 1

Related Questions