Reputation: 802
This question has been kind of answered before but answers are years old.
In my "project" I have 4 spiders and each one of them deals with different kinds of products I encounter (scraping amazon ATM). Each product has a category, for example, if I want to scrape "laptops" I use one scraper but if the objective is to scrape clothes, I have another one.
So, is it there a way to run a python script that, depending on the product I have to scrape (products are read from a txt file) a different spider is called?
Code would look like this
#Imports
def scrapyProject():
#Get the products I want to scrape
if productIsClothes:
runClothesSpider
else productIsGeneric:
runGenericSpider
I know the previous code is rough, It's kind of a sketch for the final code.
It would also help knowing which imports I need for the program to work
Upvotes: 0
Views: 1998
Reputation: 21406
You could just set spider class with an if statement:
import sys
import scrapy
from scrapy.crawler import CrawlerProcess
from project.spiders import Spider1, Spider2
def main():
process = CrawlerProcess({})
if sys.argv[1] == '1':
spider_cls = Spider1
elif sys.argv[1] == '2':
spider_cls = Spider2
else:
print('1st argument must be either 1 or 2')
return
process.crawl(spider_cls)
process.start() # the script will block here until the crawling is finished
if __name__ == '__main__':
main()
Upvotes: 4