Nutch not crawling entire website

Question

I am using nutch 2.3.1

I preform the commands to crawl a site:

./nutch inject ../urls/seed.txt
./nutch generate -topN 2500
./nutch fetch -all

The problem is, nutch is only crawling the first URL (the one specified in seeds.txt). The data is only the HTML from the first URL/page.

All the other URLS that were accumulated by the generate command are not actually crawled.

I cannot get nutch to crawl the other generated urls...I also cannot get nutch to crawl the entire website. What are the options that I need to use to crawl an entire site?

Does anyone have any insights or recommendations?

Thank you so much for your help

Nutch not crawling entire website

Answers (1)

Related Questions