mina
mina

Reputation: 21

recrawl URLs in nutch 1.3

I set re_crawler to fetch a site every day. but it fetch this site for 3 times. what property i should set in nutch? thanks.

Upvotes: 2

Views: 759

Answers (1)

jpee
jpee

Reputation: 321

I think you have found a solution by yourself in the last months but here is an answer for the community. The nutch-default.xml has 3 properties defined:

<property>
 <name>db.default.fetch.interval</name>
 <value>30</value>
 <description>(DEPRECATED) The default number of days between re-fetches of a page.
 </description>
</property>

<property>
 <name>db.fetch.interval.default</name>
 <value>2592000</value>
 <description>The default number of seconds between re-fetches of a page (30 days).
 </description>
</property>

<property>
 <name>db.fetch.interval.max</name>
 <value>7776000</value>
 <description>The maximum number of seconds between re-fetches of a page
 (90 days). After this period every page in the db will be re-tried, no
 matter what is its status.
</description>

Which can be overridden in the nutch-site.xml.

Upvotes: 3

Related Questions