Reputation: 229
what I would like to do is pretty basic I think but I couldn't find a way to implement it.
I am trying to use the FilesPipeline in scrapy in order to download a file (ex. Image1.jpg) and save it on a path relative to the item which placed that request in the first place (ex. item.name).
It is pretty similar with this question here, though I want to pass as an argument the item.name or item.something field, in order to save each file in a custom path depending on the item.name.
The path is defined in the persist_file
function, but that function does not have access to the item itself, just the file request and response.
def get_media_requests(self, item, info): return [Request(x) for x in item.get(self.FILES_URLS_FIELD, [])]
I can also see above, that the request is made here in order to process the files into the pipeline, but is there a way to pass an extra argument in order to later use it on the
file_downloaded
and afterwards onpersist_file
function?
As a final solution, it would be pretty simple to rename/move the file after it has been downloaded in one of the following pipelines but it seems sloppy, isn't it?
I am using the code implemented here as a custom pipeline.
Can anyone help please? Thank you in advance :)
Upvotes: 2
Views: 1046
Reputation: 18799
Create your own pipeline (inherited from FilesPipeline
) overriding the process_item
method of the pipeline, to pass the current item to the other functions
def process_item(self, item, spider):
info = self.spiderinfo
requests = arg_to_iter(self.get_media_requests(item, info))
dlist = [self._process_request(r, info, item) for r in requests]
dfd = DeferredList(dlist, consumeErrors=1)
return dfd.addCallback(self.item_completed, item, info)
then you need to override _process_request
and keep passing the item argument to use it for when creating the file path.
Upvotes: 1