Manuel
Manuel

Reputation: 802

Running a scrapy program from another python script

This question has been kind of answered before but answers are years old.

In my "project" I have 4 spiders and each one of them deals with different kinds of products I encounter (scraping amazon ATM). Each product has a category, for example, if I want to scrape "laptops" I use one scraper but if the objective is to scrape clothes, I have another one.

So, is it there a way to run a python script that, depending on the product I have to scrape (products are read from a txt file) a different spider is called?

Code would look like this

#Imports

def scrapyProject():

    #Get the products I want to scrape
    if productIsClothes:

        runClothesSpider

    else productIsGeneric:

        runGenericSpider

I know the previous code is rough, It's kind of a sketch for the final code.

It would also help knowing which imports I need for the program to work

Upvotes: 0

Views: 1998

Answers (1)

Granitosaurus
Granitosaurus

Reputation: 21406

You could just set spider class with an if statement:

import sys

import scrapy
from scrapy.crawler import CrawlerProcess

from project.spiders import Spider1, Spider2

def main():
    process = CrawlerProcess({})

    if sys.argv[1] == '1':
        spider_cls = Spider1
    elif sys.argv[1] == '2':
        spider_cls = Spider2
    else:
        print('1st argument must be either 1 or 2')
        return
    process.crawl(spider_cls)
    process.start() # the script will block here until the crawling is finished

if __name__ == '__main__':
    main()

Upvotes: 4

Related Questions