Reputation: 8670
My nutch version is 2.2.1 and it is working well for few days but now it is not going to crawl anything any gives following error like.
Limit reached, skipping further inlinks for de.ard.www:http/
Limit reached, skipping further inlinks for de.rbb-online.mediathek:http/
Limit reached, skipping further inlinks for de.rbb-online.www:http/
How to get rid of it?
Upvotes: 0
Views: 197
Reputation: 68
This is not an error. Actually this means finds more inlinks than default setting (db.max.inlinks),only the first N inlinks will be stored, and the rest will be discarded.At the default db.max.inlinks is set 10000.
IMHO if you want to crawl more outlinks pages. You should increase db.max.outlinks.per.page settings. At the defualt it is set 100 per page.
Upvotes: 1