RCrawler : way to limit number of pages that RCrawler collects? (not crawl depth)

Question

I'm using RCrawler to crawl ~300 websites. The size of websites is quite diverse: some are small (dozen or so pages) and others are large (1000s pages per domain). To crawl the latter is very time-consuming, and - for my research purpose - the added value of more pages when I already have a few hundred, decreases.

So: is there a way to stop the crawl if an x number of pages is collected?

I know I can limit the crawl with MaxDepth, but even at MaxDepth=2, this is still an issue. MaxDepth=1 is not desirable for my research. Also, I'd prefer to keep MaxDepth high, so the smaller websites do get crawled completely.

Thanks a lot!

RCrawler : way to limit number of pages that RCrawler collects? (not crawl depth)

Answers (1)

Related Questions