Vaibhav Jain
Vaibhav Jain

Reputation: 5507

Correct way of downloading images using scrapy

I am downloading images from this site. But the problem is the product have Three images one is used on this site and the others are used on the product specific page. I am able to download images from this site but i want to download the rest of the images too but their URL is given on the product specific page..is their any way by which i can download all the images at a time means i want to collect product related data at one shot...

Like to make a request in parse method to read the product page and extract the images URLS also at a same time. Below is my code parse method.

class ESpider(BaseSpider):
    name = "eSpider"
    allowed_domains = ["1click1call.com"]
    start_urls = "http://1click1call.com/Jeans-Shirts-Tshirts-Trousers"

    def parse(self, response):                      
        hxs = HtmlXPathSelector(response)        
        sites = hxs.select('//div[@class="bord"]')
        items = []
        for site in sites:
            item = EscraperItem()
            item['productSite'] = "http://1click1call.com/"
            item['productPrice'] = site.select('div[@class="price"]').extract()            
            item['productURL'] = site.select('div[@class="image"]/a/@href').extract()
            item['productTitle'] = site.select('div[@class="name"]/a/text()').extract()
            item['productImage'] = site.select('div[@class="image"]/a/img/@src').extract()
            item['productDesc'] = site.select('div[@class="description"]/text()').extract()
            item['image_urls'] = item['productImage']
            items.append(item)

        return items

For example at this product page Their are four images and i want to extract all these images at the same time i am crawling this product catalog

TO extract specific product images i am using use these :

hxs.select('//div[@class="left"]//div[@class="image"]/a/@href').extract()
hxs.select('//div[@class="left"]//div[@class="image"]/a/img/@src').extract()
hxs.select('//div[@class="left"]//div[@class="image-additional"]/a/img/@src').extract()
hxs.select('//div[@class="left"]//div[@class="image-additional"]/a/@href').extract()

So i want to download these images as well when i am downloading image from the catalog page..like above i am doing in parse method...is there any way of doing it...easily...one way is to read product URL form the JSON file...and then extract them...is their any other way of doing it....

Upvotes: 4

Views: 2889

Answers (1)

akhter wahab
akhter wahab

Reputation: 4085

don't return item from parse method , but yield a request for prodcuturl and then yield/return item in product_detail_page

 def parse(self, response):                      
        hxs = HtmlXPathSelector(response)        
        sites = hxs.select('//div[@class="bord"]')
        items = []
        for site in sites:
            item = EscraperItem()
            item['productSite'] = "http://1click1call.com/"
            item['productPrice'] = site.select('div[@class="price"]').extract()            
            item['productURL'] = site.select('div[@class="image"]/a/@href').extract()
            item['productTitle'] = site.select('div[@class="name"]/a/text()').extract()
            item['productImage'] = site.select('div[@class="image"]/a/img/@src').extract()
            item['productDesc'] = site.select('div[@class="description"]/text()').extract()
            item['image_urls'] = item['productImage']
            yield Request(item['productURL'][0],
                          meta={'item':item},
                          callback=self.product_detail_page)


def product_detail_page(self,response):
    hxs=HtmlXpathSelector(response)
    item=response.request.meta['item']
    # add all images url's in item['image_urls']
    yield item

Upvotes: 2

Related Questions