With nutch crawl, if I use smaller values for -topN and -depth, will it still crawl all the same pages?

Question

I am running Nutch 1.4/Solr 4.10 to index a number of sites. My crawl includes a number of seed pages with several hundred links. I am currently running with

-topN 400 -depth 20

With these settings it takes 5-7 hours to complete the crawl. I would like to have each individual iteration of "nutch crawl" take less time, but I need to ensure all pages are crawled eventually. Can I reduce either my -topN or -depth values and still be sure all pages will be crawled?

With nutch crawl, if I use smaller values for -topN and -depth, will it still crawl all the same pages?

Answers (1)

Related Questions