Reputation: 66
I have a single Scrapy spider that I pass system arguments to using the scrapy crawl command. I am trying to run this spider using CrawlerProcess instead of the command line. How can I pass all the same command line arguments to this crawler process ?
scrapy crawl example -o data.jl -t jsonlines -s JOBDIR=/crawlstate
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
process = CrawlerProcess(get_project_settings())
process.crawl(#How do I Pass arguments like -o data.jl -t jsonlines -s
JOBDIR=/crawlstate here?)
process.start()
Upvotes: 2
Views: 557
Reputation: 10220
You can modify your project settings before you pass them to CrawlerProcess
constructor:
...
settings = get_project_settings()
settings.set('FEED_URI', 'data.jl', priority='cmdline')
settings.set('FEED_FORMAT', 'jsonlines', priority='cmdline')
settings.set('JOBDIR', '/crawlstate', priority='cmdline')
process = CrawlerProcess(settings)
...
Upvotes: 3