How can I combine the two spiders into just one?

Question

There are two spiders which use the same resource file and almost the same structure.

The spiderA contains :

import scrapy
import pkgutil

class StockSpider(scrapy.Spider):
    name = "spiderA"
    data = pkgutil.get_data("tutorial", "resources/webs.txt")
    data = data.decode()
    urls = data.split("
")
    start_urls = [url + "string1"  for url in urls]

    def parse(self, response):
        pass

The spiderB contains :

import scrapy
import pkgutil

class StockSpider(scrapy.Spider):
    name = "spiderB"
    data = pkgutil.get_data("tutorial", "resources/webs.txt")
    data = data.decode()
    urls = data.split("
")
    start_urls = [url + "string2"  for url in urls]

    def parse(self, response):
        pass

How can I combine spiderA and spiderB, and add a switch variable to let crapy scral call different spider depending on my need?

vezunchik · Accepted Answer

Try to add separate parameter for spider type. You can set it with calling scrapy crawl myspider -a spider_type=second. Check this code example:

import scrapy
import pkgutil

class StockSpider(scrapy.Spider):
    name = "myspider"

    def start_requests(self):
        if not hasattr(self, 'spider_type'):
            self.logger.error('No spider_type specified')
            return
        data = pkgutil.get_data("tutorial", "resources/webs.txt")
        data = data.decode()

        for url in data.split("
"):
            if self.spider_type == 'first':
                url += 'first'
            if self.spider_type == 'second':
                url += 'second'
            yield scrapy.Request(url)

    def parse(self, response):
        pass

And also you can always create base main class and then inherit from it, overloading only one variable (that you add to url) and name (for separate calls).

How can I combine the two spiders into just one?

Answers (2)

Related Questions