Amir Asyraf
Amir Asyraf

Reputation: 680

Scraping and downloading images without a File Extension

I'm trying to use Scrapy's Image/File pipeline to download images without any file extension.

For example, this image:

https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80

As you can see, the image loads just fine, and I'm able to scrape the url in Scrapy. However, passing the url to image_urls or file_urls yield no downloaded images.

I've tried appending ".jpg" to the end of the url, it doesn't work.

How would I download these kind of images?

EDIT:

I have already enabled ImagePipeline. Downloading from other URLs with proper file extension to them works fine, and I can see the images are downloaded to the designated folders.

Upvotes: 0

Views: 1211

Answers (1)

Guillaume
Guillaume

Reputation: 1879

Have you enabled the ImagePipeline in your settings?

You should be able to see an INFO log that looks like this:

2018-11-14 10:37:33 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline']

This code worked for me:

from scrapy.spiders import Spider

class MySpider(Spider):

    name = "burpple-2.imgix.net"
    start_urls = ['https://burpple-2.imgix.net/']

    custom_settings = {
        'ITEM_PIPELINES': {'scrapy.pipelines.images.ImagesPipeline': 1},
        'IMAGES_STORE': '/some/valid/folder/',
    }

    def parse(self, response):
        yield {
            'image_urls': ['https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80'],
        }

Upvotes: 2

Related Questions