Will W
Will W

Reputation: 69

How to scrape lazy loading images using python Scrapy

Here is the code which i used for crawling a web page. The site I want to scrape has images lazy loading enabled, so scrapy can only grab 10 out of 100 images, the rest are all placeholder.jpg. What would be the best way to deal with lazy loading images in Scrapy?

Thanks!

class MasseffectSpider(scrapy.Spider):
name = "massEffect"
allowed_domains = ["amazon.com"]
start_urls = [
    'file://127.0.0.1/home/ec2-user/scrapy/amazon/amazon.html',
]


def parse(self, response):

for item in items:
    listing = Item()
    listing['image'] =  item.css('div.product img::attr(src)').extract()
    listing['url'] =  item.css('div.item-name a::attr(href)').extract()
    listings.append(listing)

It seems other tools like CasperJS has the viewport to load the images.

casper.start('http://m.facebook.com', function() {

// The pretty HUGE viewport allows for roughly 1200 images.
// If you need more you can either resize the viewport or scroll down the viewport to load more DOM (probably the best approach).
this.viewport(2048,4096);

this.fill('form#login_form', {
    'email': login_username,
    'pass':  login_password
}, true);
});

Upvotes: 3

Views: 4619

Answers (2)

Muhammad Usman
Muhammad Usman

Reputation: 231

To scrape images in lazy loading, you have to track ajax request which returns images. After this you Hit that request in scrapy. After getting all data from certain page. You have to send Extracted data to other callback via meta in scrapy request. For further help Scrapy request

Upvotes: 1

Rafael Almeida
Rafael Almeida

Reputation: 5240

The problem is that lazy loading is being made by Javascript which scrapy can't handle, casperjs handles this.

To make this work with scrapy you have to mix it with Selenium or scrapyjs

Upvotes: 4

Related Questions