Adi Daryanani
Adi Daryanani

Reputation: 66

How to pass system command line arguments to the Scrapy CrawlerProcess?

I have a single Scrapy spider that I pass system arguments to using the scrapy crawl command. I am trying to run this spider using CrawlerProcess instead of the command line. How can I pass all the same command line arguments to this crawler process ? scrapy crawl example -o data.jl -t jsonlines -s JOBDIR=/crawlstate

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
process = CrawlerProcess(get_project_settings())
process.crawl(#How do I Pass arguments like -o data.jl -t jsonlines -s 
JOBDIR=/crawlstate here?)
process.start()

Upvotes: 2

Views: 557

Answers (1)

Tomáš Linhart
Tomáš Linhart

Reputation: 10220

You can modify your project settings before you pass them to CrawlerProcess constructor:

...
settings = get_project_settings()
settings.set('FEED_URI', 'data.jl', priority='cmdline')
settings.set('FEED_FORMAT', 'jsonlines', priority='cmdline')
settings.set('JOBDIR', '/crawlstate', priority='cmdline')
process = CrawlerProcess(settings)
...

Upvotes: 3

Related Questions