Creating Scrapy instance variables

Question

I would like to pass arguments to my spider in order to search a site based on the input, but I am struggling to set instance variables. It seems that init is getting called twice, the first time it uses the arguments I pass, and the second time it seems to be getting called by a scrapy function that doesn't pass along my input and resets self.a and self.b to the default value: 'f'.

I read on another post that scrapy would automatically set any passed variables as instance attributes, but I have not found a way to access them.

Is there a solution to this, or an easier way that I am missing?

import scrapy
from scrapy_splash import SplashRequest 
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings


class PracticeSpider(scrapy.Spider):
name = 'practice'

    def __init__(self, a='f', b='f' *args, **kwargs):
        super(PracticeSpider, self).__init__(*args, **kwargs)
        self.a = a
        self.b = b
        print self.a
        print self.b

    def start_requests(self):
        print self.a
        print self.b
        yield SplashRequest(''.join(["https://www.google.com/search?q=",
                             self.a, "+", self.b]), self.practice_parse, args={'wait': 0.5})

    def practice_parse(self):
        pass


# list of crawlers
TO_CRAWL = [PracticeSpider]

# crawlers that are running
RUNNING_CRAWLERS = []

for spider in TO_CRAWL:

    process = CrawlerProcess(get_project_settings())
    for spider in TO_CRAWL:
        process.crawl(spider(a='first', b='second'))
    process.start()

Creating Scrapy instance variables

Answers (1)

Related Questions