Scrapy - Stopping crawler when duplicate item encountered

Question

There are lots of websites I have to hard code page follow (incrementing page number after crawling items) and some of those websites return to page 1 after the last page. For example if a website has 25 pages of items, sending a request to the 26th page yields a response of first page.

At that point duplicate filter of Scrapy works fine and doesn't scrape items, but the crawler keeps running. Is there any way to stop crawling process when duplicate filter is triggered like this?

I don't want to hardcode the page number like this since it can change over time.

if self.page < 25:
    yield scrapy.Request(...)

Scrapy - Stopping crawler when duplicate item encountered

Answers (1)

Related Questions