Reputation: 3663
I have a spiders.py
in a Scrapy project with the following spiders...
class OneSpider(scrapy.Spider):
name = "s1"
def start_requests(self):
urls = ["url1.com",]
yield scrapy.Request(
url="http://url1.com",
callback=self.parse
)
def parse(self,response):
## Scrape stuff, put it in a dict
yield dictOfScrapedStuff
class TwoSpider(scrapy.Spider):
name = "s2"
def start_requests(self):
urls = ["url2.com",]
yield scrapy.Request(
url="http://url2.com",
callback=self.parse
)
def parse(self,response):
## Scrape stuff, put it in a dict
yield dictOfScrapedStuff
How do I run spiders s1
and s2
, and write their scraped results to s1.json
and s2.json
?
Upvotes: 0
Views: 389
Reputation: 21406
Scrapy doesn't support running multiple spiders as a single process, so you'd simply run two processes:
scrapy crawl s1 -o s1.json
scrapy crawl s2 -o s2.json
if you want to do it in the same terminal window you'd have to either:
use nohup
, e.g.:
nohup scrapy crawl s1 -o s1.json --logfile s1.log &
use screen
command.
$ screen
$ scrapy crawl s1 -o s1.json
$ ctrl+a ctrL+d # detach screen
$ screen
$ scrapy crawl s2 -o s2.json
$ ctrl+a ctrL+d # detach screen
$ screen -r # to reattach to one of your sessions to see how the spider is doing
Personally I prefer nohup or screen options as they are clean and do not mess up your terminal with logging and whatnot.
Upvotes: 1