Reputation: 1
I have a couple of spiders that are set to be executed one after the other, like
SETTINGS = {
...,
"ITEM_PIPELINES": {
"pipelines.my_spider_pipeline.MySpiderPipeline": 1,
"pipelines.my_images_pipeline.MyImagesPipeline": 2,
},
}
Which doesn't seem to work as expected, and I'm not sure if it's because of the code that's in pipelines.my_spider_pipeline.MySpiderPipeline
;
class MySpiderPipeline(object):
def __init__(self, stats):
self.stats = stats
@classmethod
def from_crawler(cls, crawler):
spider = cls(crawler.stats)
crawler.signals.connect(spider.item_scraped, signal=signals.item_scraped)
return spider
The stats
argument is for passing a StatsCollector class.
Now, whenever my code is executed, it goes first to from_crawler
but then jumps to another function defined in MyImagesPipeline
, but I need it to go to process_item
in MySpiderPipeline
instead, as it's there where I'm inserting data in the database, and I need the id of the database record to be available once in MyImagesPipeline
.
What's to be done for that? I think this code isn't flexible at all, and any possible change would mean moving a lot of code. Open to any suggestion.
Tried not using from_crawler
, but didn't change anything.
Upvotes: -2
Views: 52
Reputation: 2120
First you need to define the id to be part of your item
definition. In the process_item
method of the MySpiderPipeline
class you need to obtain the id
of the item inserted in the database and save it as part of the item
attributes.
class MySpiderPipeline:
def process_item(self, item, spider):
# insert item in db and get back the id inserted
# code here
# add the id returned to the item and return it
item[id] = 'id'
return item
In the process_item
method of the MyImagesPipeline
class you need to retrieve the value of the id
that you set in the MySpiderPipeline
class and use it as applicable.
class MyImagesPipeline:
def process_item(self, item, spider):
# retrive the id value that is part of the item
id = item["id"]
# use the id value as needed
# code here
return item
Upvotes: 0