Reputation: 5292
Hello I have two pipelines, the first one to download photos:
class ModelsPipeline(ImagesPipeline):
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield scrapy.Request(image_url)
def file_path(self, request, response=None, info=None, *, item=None):
image_url_hash = hashlib.shake_256(request.url.encode()).hexdigest(5)
image_filename = f'{item["name"]}/{image_url_hash}.jpg'
return image_filename
def item_completed(self, results, item, info):
image_paths = [x['path'] for ok, x in results if ok]
for image in image_paths:
file_extension = os.path.splitext(image)[1]
img_path = f'{IMAGES_STORE}{image}'
md5 = hashlib.md5(open(img_path, 'rb').read()).hexdigest()
img_destination = f'{IMAGES_STORE}{item["name"]}/{md5}{file_extension}'
os.rename(img_path, img_destination)
return item
The second one is to store previous info in the database
class DatabasePipeline():
def open_spider(self, spider):
self.client = db_connect()
def close_spider(self, spider):
self.client.close()
def process_item(self, item, spider):
self.client.upsert(item)
The item_completed function in the first pipeline, returns a name and a path that I want to send to the second pipeline in order to store in the database, but I can not get access to that data.
The question is how can I do that?
Thanks
Upvotes: 2
Views: 344
Reputation: 21
I run into the same problem recently. You need to enable both pipelines and assign a lower priority to the DatabasePipeline like the following. Higher number means lower priority .
So the data will be processed first by ModelsPipeline then by DatabasePipeline. Remember to return the item inside the process of ModelsPipeline
ITEM_PIPELINES = {
"project_name.pipelines.ModelsPipeline": 300,
"project_name.pipelines.DatabasePipeline": 302,
}
Upvotes: 0
Reputation: 2116
You can add the name and the path to the item in the ModelsPipeline:
item['name_from_pipeline'] = name
item['path_from_pipeline'] = path
return item
In process_item of the DatabasePipeline you can access it:
name = item['name_from_pipeline']
path = item['path_from_pipeline']
Upvotes: 0