Scrapy 2.4.0 rename images in pipeline

Question

I've only started learning Scrapy from tutorials but I've got a spider which successfully downloads images from a website but I've been unable to rename the images using other SO answers. I notice that most answers are over 4 years old and have given me deprecation warnings when I run them, so I would like to know how to fix my pipeline to avoid such warnings.

Can someone please explain to me how I can fix my pipeline class to rename the images?

class ImagetestPipeline(ImagesPipeline):
    
    CONVERTED_ORIGINAL = re.compile('^full/[0-9,a-f]+.jpg$')

    # name information coming from the spider, in each item
    # add this information to Requests() for individual images downloads
    # through "meta" dictionary
    def get_media_requests(self, item, info):
        print("get_media_requests")
        yield [Request(x, meta={'image_name': item["image_names"]})
                for x in item.get('image_urls', [])]

    # this is where the image is extracted from the HTTP response
    def get_images(self, response, request, info):
        print("get_images")

        for key, image, buf, in super().get_images(response, request, info):
            if self.CONVERTED_ORIGINAL.match(key):
                key = self.change_filename(key, response)
            yield key, image, buf

    def change_filename(self, key, response):
        newname = response.meta['image_name'][0]
        return f"{newname}.jpg"
    
    def file_path(self, request, response=None, info=None):
        """This is the method used to determine file path"""
        path = super().file_path(request, response, info)
        return path.replace('full', '')

EDIT

class ImagetestPipeline(ImagesPipeline):
    
    def process_item(self, item, spider):
        self.product_name = spider.product_name
        return item
    
    def file_path(self, request, response=None, info=None):
        fileName = self.product_name
        fileExtension = fileName.split('.')[-1] # Get the file extension (e.g. .jpg, .png)
        return fileName + '.' + fileExtension

kynnemall · Accepted Answer

I found a solution after much searching. Using the code below in my custom pipeline, which inherits from Scrapy's ImagesPipeline, and defining image_name as a Field in my custom item, I can now rename the images as I want.

def get_media_requests(self, item, info):
    return [Request(x, meta={'image_name': item["image_name"]}) 
            for x in item.get('image_urls', [])]

def file_path(self, request, response=None, info=None):
    return f'{request.meta["image_name"]}.jpg'

Scrapy 2.4.0 rename images in pipeline

Answers (2)

Related Questions