How to scrape lazy loading images using python Scrapy

Question

Here is the code which i used for crawling a web page. The site I want to scrape has images lazy loading enabled, so scrapy can only grab 10 out of 100 images, the rest are all placeholder.jpg. What would be the best way to deal with lazy loading images in Scrapy?

Thanks!

class MasseffectSpider(scrapy.Spider):
name = "massEffect"
allowed_domains = ["amazon.com"]
start_urls = [
    'file://127.0.0.1/home/ec2-user/scrapy/amazon/amazon.html',
]


def parse(self, response):

for item in items:
    listing = Item()
    listing['image'] =  item.css('div.product img::attr(src)').extract()
    listing['url'] =  item.css('div.item-name a::attr(href)').extract()
    listings.append(listing)

It seems other tools like CasperJS has the viewport to load the images.

casper.start('http://m.facebook.com', function() {

// The pretty HUGE viewport allows for roughly 1200 images.
// If you need more you can either resize the viewport or scroll down the viewport to load more DOM (probably the best approach).
this.viewport(2048,4096);

this.fill('form#login_form', {
    'email': login_username,
    'pass':  login_password
}, true);
});

Rafael Almeida · Accepted Answer

The problem is that lazy loading is being made by Javascript which scrapy can't handle, casperjs handles this.

To make this work with scrapy you have to mix it with Selenium or scrapyjs

How to scrape lazy loading images using python Scrapy

Answers (2)

Related Questions