yangmillstheory
yangmillstheory

Reputation: 1065

Scrapy - keep spider open indefinitely

I'm planning to have daemon CrawlWorker (subclassing multiprocessing.Process) that monitors a queue for scrape requests.

The responsibility of this worker is to take scrape requests from the queue and feed them to spiders. In order to avoid implementing batching logic (like wait for N requests before creating a new spider), would it make sense to keep all my spiders alive, and then add more scrape requests to each spider when they're idle, and if there are no more scrape requests, keep them open?

What would be the best, simplest, and most elegant way to implement this? It seems that given that attributes start_urls, that a spider is meant to be instantiated with an initial work list, do its work, then die.

I'm thinking of listening to spider_closed, but is there an exception I can raise to keep it open?

Upvotes: 1

Views: 774

Answers (1)

yangmillstheory
yangmillstheory

Reputation: 1065

So I think the best way is connecting to signals.spider_idle, and raising DontCloseSpider. Here's the reference.

Upvotes: 2

Related Questions