how to access commandline parameter in a crawlspider in scrapy?

Question

I want to pass a parameter in scrapy crawl ... command line to be used in the rule definition in the extended CrawlSpider, like the following

name = 'example.com'
allowed_domains = ['example.com']
start_urls = ['http://www.example.com']

rules = (
    # Extract links matching 'category.php' (but not matching 'subsection.php')
    # and follow links from them (since no callback means follow=True by default).
    Rule(SgmlLinkExtractor(allow=('category\.php', ), deny=('subsection\.php', ))),

    # Extract links matching 'item.php' and parse them with the spider's method parse_item
    Rule(SgmlLinkExtractor(allow=('item\.php', )), callback='parse_item'),
)

I want that the allow attribute in the SgmlLinkExtractor is specified in the command line parameter. I have googled and found that I can get the parameter value in the spider's __init__ method, but how can I get the parameter in the command line to be used in the Rule definition?

how to access commandline parameter in a crawlspider in scrapy?

Answers (1)

Related Questions