Scrapy:Pass data between 2 spiders

Question

I need to create a spider that crawls for some data from web site. part of the data is an external URL.

I already created the spider that crawls the data from the root site and now i want to write the spider for external web pages.

I was thinking of creating a crawlspider that uses the SgmlLinkExtractor to follow some specific links in each external web page.

what is the recommended way to communicate the list of start_url to the second spider?

My idea is to generate a json file for the items and to read the attribute in start_requests of the second spider.

warvariuc · Accepted Answer

I already created the spider that crawls the data from the root site and now i want to write the spider for external web pages.

Save these external page urls to a db.

what is the recommended way to communicate the list of start_url to the second spider?

Override BaseSpider.start_requests in your other spider and create requests from urls you get from the db.

Answers (2)