Violet
Violet

Reputation: 33

scrapy image pipeline filename unsing other crawled info

Is there any way to name a crawled image with other info(text) that we get with the spider? for example in this case I want images with the article title and article published date that I got in spider:

spider file

# lines of code 

def parse(self, response):

    # lines of code 

    yield {
            'date':date,
            'title': article_title,
            'image_urls': clean_urls
    }

pipelines.py

from scrapy.pipelines.images import ImagesPipeline

class customImagesPipeline(ImagesPipeline):
    def file_path(self, request, response=None, info=None, *, item=None):
        return f"images/{request.url.split('/')[-1]}"

Upvotes: 2

Views: 222

Answers (1)

Patrick Klein
Patrick Klein

Reputation: 1191

One way to go about this is to overwrite the get_media_requests method and set the image name there on the image requests meta attribute, so you can access it in the file_path method.

The following example will work if you pass one image url as string to image_urls:

from scrapy.http import Request
from scrapy.pipelines.images import ImagesPipeline


class customImagesPipeline(ImagesPipeline):
    def get_media_requests(self, item, info):
        return Request(
            item["image_urls"],
            meta = {
                "image_name": f"{item['title']}_{item['date']}",
            }
        )

    def file_path(self, request, response=None, info=None) -> str:
        return f"images/{request.meta['image_name']}.jpg"

Upvotes: 2

Related Questions