Reputation: 680
I'm trying to use Scrapy's Image/File pipeline
to download images without any file extension.
For example, this image:
https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80
As you can see, the image loads just fine, and I'm able to scrape the url in Scrapy
. However, passing the url to image_urls
or file_urls
yield no downloaded images.
I've tried appending ".jpg" to the end of the url, it doesn't work.
How would I download these kind of images?
EDIT:
I have already enabled ImagePipeline
. Downloading from other URLs with proper file extension to them works fine, and I can see the images are downloaded to the designated folders.
Upvotes: 0
Views: 1211
Reputation: 1879
Have you enabled the ImagePipeline in your settings?
You should be able to see an INFO log that looks like this:
2018-11-14 10:37:33 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline']
This code worked for me:
from scrapy.spiders import Spider
class MySpider(Spider):
name = "burpple-2.imgix.net"
start_urls = ['https://burpple-2.imgix.net/']
custom_settings = {
'ITEM_PIPELINES': {'scrapy.pipelines.images.ImagesPipeline': 1},
'IMAGES_STORE': '/some/valid/folder/',
}
def parse(self, response):
yield {
'image_urls': ['https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80'],
}
Upvotes: 2