N. Blood
N. Blood

Reputation: 1

Using Scrapy with Python I fail to download images

I'm trying to scrape a few images from a website. Sorry in advance, I am not very experimented with Python and it is the first time I try using scrapy.

I manage apparently to get all the images I need, but they somehow get lost and my output folder remains empty.

I looked at a few tutorials and all the similar questions I could find on SO, but nothing seemed to really work out.

My spider:

from testspider.items import TestspiderItem
import datetime
import scrapy

class PageSpider(scrapy.Spider):
    
    name = 'page-spider'
    start_urls = ['http://scan-vf.co/one_piece/chapitre-807/1']

    def parse(self, response):
        SET_SELECTOR = '.img-responsive'
        page = 1
        
        for imgPage in response.css(SET_SELECTOR):
            IMAGE_SELECTOR = 'img ::attr(src)'

            imgURL = imgPage.css(IMAGE_SELECTOR).extract_first()
            title = 'op-807-' + str(page)

            page += 1

            yield TestspiderItem({'title':title, 'image_urls':[imgURL]})

My items:

import scrapy

class TestspiderItem(scrapy.Item):

    title = scrapy.Field()
    image_urls = scrapy.Field()
    images = scrapy.Field()

My settings:

BOT_NAME = 'testspider'
SPIDER_MODULES = ['testspider.spiders']
NEWSPIDER_MODULE = 'testspider.spiders'
DEFAULT_ITEM_CLASS = 'testspider.items'
ROBOTSTXT_OBEY = True
ITEM_PIPELINES = {
    'scrapy.pipelines.images.ImagesPipeline': 1,
}
IMAGE_STORE = '/home/*******/documents/testspider/output'

If you could be so kind as to help me understanding what's missing / what's incorrect, I would be grateful

Upvotes: 0

Views: 145

Answers (1)

gangabass
gangabass

Reputation: 10666

If you check a source code (usually Ctrl+U in a browser) you'll find that each img is a something like this:

<img class="img-responsive" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" data-src=' https://www.scan-vf.co/uploads/manga/one_piece/chapters/chapitre-807/01.jpg ' alt='One Piece: Chapter chapitre-807 - Page 1'/>

As you can see you need to use data-src in your code instead of src:

IMAGE_SELECTOR = 'img ::attr(data-src)'

Upvotes: 1

Related Questions