Rambo
Rambo

Reputation: 11

Nutch 1.14 - not crawling all links in the page

I have got nutch 1.14 working with Solr 6.4.2 Nutch not crawling (following through) all links in the page

<property>
  <name>db.ignore.internal.links</name>
  <value>false</value>
</property>
<property>
  <name>db.ignore.external.links</name>
  <value>false</value>
</property>

Upvotes: 1

Views: 140

Answers (1)

Tony Friz
Tony Friz

Reputation: 893

There are very many possibilities here, the nutch-site.xml houses many many properties.

Have you checked this one:

<property>
   <name>db.max.outlinks.per.page</name>
   <value>100</value>
   <description>The maximum number of outlinks that we'll process for a page.
       If this value is nonnegative (>=0), at most db.max.outlinks.per.page outlinks
       will be processed for a page; otherwise, all outlinks will be processed.
   </description>
</property>

Upvotes: 1

Related Questions