Reputation: 11
I have got nutch 1.14 working with Solr 6.4.2 Nutch not crawling (following through) all links in the page
<property>
<name>db.ignore.internal.links</name>
<value>false</value>
</property>
<property>
<name>db.ignore.external.links</name>
<value>false</value>
</property>
Upvotes: 1
Views: 140
Reputation: 893
There are very many possibilities here, the nutch-site.xml houses many many properties.
Have you checked this one:
<property>
<name>db.max.outlinks.per.page</name>
<value>100</value>
<description>The maximum number of outlinks that we'll process for a page.
If this value is nonnegative (>=0), at most db.max.outlinks.per.page outlinks
will be processed for a page; otherwise, all outlinks will be processed.
</description>
</property>
Upvotes: 1