Orlando
Orlando

Reputation: 11

About rename downloaded images in scrapy

I am very new in scrapy so for me it is difficult to do very basic things in scrapy. My problem is that I can't rename my downloaded images. I copied part of my code from this website:"http://scrapingauthority.com/scrapy-download-images/" but it doesn't work. So my spider's code is this:

from scrapy import Request, Spider
from Imagenes.items import ImagenesItem

class AuthorSpider(Spider):
    name = 'imagenpruebarenombrar'
    start_urls = [
        "http://quotes.toscrape.com/",        
    ]

    def parse(self, response):

        item = ImagenesItem()
        img_urls = [
            "http://automationpractice.com/img/p/5/5-large_default.jpg",
            "http://automationpractice.com/img/p/6/6-large_default.jpg",
            "http://automationpractice.com/img/p/7/7-large_default.jpg",
        ]
        img_name = [ #These are the names that I want to my images
            "1",
            "2",
            "3",
        ]
        item["image_urls"] = img_urls
        item["image_name"] = img_name
        return item

The code of items:

import scrapy

class ImagenesItem(scrapy.Item):
    images = scrapy.Field()
    image_urls = scrapy.Field()
    image_name = scrapy.Field()

Code of Pipelines:

class CustomImageNamePipeline(ImagesPipeline): #I copied this code from the website

    def get_media_requests(self, item, info):
        return [Request(x, meta={'image_name': item["image_name"]})
                for x in item.get('image_urls', [])]

    def file_path(self, request, response=None, info=None):
        return '%s.jpg' % request.meta['image_name']

My settings:

BOT_NAME = 'Imagenes'
SPIDER_MODULES = ['Imagenes.spiders']
NEWSPIDER_MODULE = 'Imagenes.spiders'
ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}
IMAGES_STORE = r"C:\Users\Orlando\Imagenes"

Upvotes: 1

Views: 742

Answers (2)

gangabass
gangabass

Reputation: 10666

First you need to edit your settings.py:

ITEM_PIPELINES = {'Imagenes.pipelines.CustomImageNamePipeline': 1}

Next in your pipelines.py:

class CustomImageNamePipeline(ImagesPipeline): #I copied this code from the website


    def get_media_requests(self, item, info):
        for image in item.get('image_urls', []):
            yield scrapy.Request(image["url"], meta={'image_name': image["name"]})

    def file_path(self, request, response=None, info=None):
        return '%s.jpg' % request.meta['image_name']

and finally in your spider:

def parse(self, response):

    item = ImagenesItem()

    img_urls = [
        "http://automationpractice.com/img/p/5/5-large_default.jpg",
        "http://automationpractice.com/img/p/6/6-large_default.jpg",
        "http://automationpractice.com/img/p/7/7-large_default.jpg",
    ]
    img_names = [ #These are the names that I want to my images
        "1",
        "2",
        "3",
    ]

    images = []
    for image_url, image_name in zip(img_urls, img_names):
        images.append({'url': image_url, 'name': image_name})

    item["image_urls"] = images
    yield item

Upvotes: 3

furas
furas

Reputation: 142711

You have to add your CustomImageNamePipeline instead of ImagesPipeline to settings

If you have class in file pipelines.py then add to settings.py

ITEM_PIPELINES = {'pipelines.CustomImageNamePipeline': 1}

or maybe with project name

ITEM_PIPELINES = {'Imagenes.pipelines.CustomImageNamePipeline': 1}

If you have all code in one file (without creating project) then add it the same file

ITEM_PIPELINES = {'__main__.CustomImageNamePipeline': 1}

Upvotes: 0

Related Questions