Reputation: 5507
I am downloading images from this site. But the problem is the product have Three images one is used on this site and the others are used on the product specific page. I am able to download images from this site but i want to download the rest of the images too but their URL is given on the product specific page..is their any way by which i can download all the images at a time means i want to collect product related data at one shot...
Like to make a request in parse method to read the product page and extract the images URLS also at a same time. Below is my code parse method.
class ESpider(BaseSpider):
name = "eSpider"
allowed_domains = ["1click1call.com"]
start_urls = "http://1click1call.com/Jeans-Shirts-Tshirts-Trousers"
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@class="bord"]')
items = []
for site in sites:
item = EscraperItem()
item['productSite'] = "http://1click1call.com/"
item['productPrice'] = site.select('div[@class="price"]').extract()
item['productURL'] = site.select('div[@class="image"]/a/@href').extract()
item['productTitle'] = site.select('div[@class="name"]/a/text()').extract()
item['productImage'] = site.select('div[@class="image"]/a/img/@src').extract()
item['productDesc'] = site.select('div[@class="description"]/text()').extract()
item['image_urls'] = item['productImage']
items.append(item)
return items
For example at this product page Their are four images and i want to extract all these images at the same time i am crawling this product catalog
TO extract specific product images i am using use these :
hxs.select('//div[@class="left"]//div[@class="image"]/a/@href').extract()
hxs.select('//div[@class="left"]//div[@class="image"]/a/img/@src').extract()
hxs.select('//div[@class="left"]//div[@class="image-additional"]/a/img/@src').extract()
hxs.select('//div[@class="left"]//div[@class="image-additional"]/a/@href').extract()
So i want to download these images as well when i am downloading image from the catalog page..like above i am doing in parse method...is there any way of doing it...easily...one way is to read product URL form the JSON file...and then extract them...is their any other way of doing it....
Upvotes: 4
Views: 2889
Reputation: 4085
don't return
item
from parse
method , but yield
a request
for prodcuturl
and then yield/return
item
in product_detail_page
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@class="bord"]')
items = []
for site in sites:
item = EscraperItem()
item['productSite'] = "http://1click1call.com/"
item['productPrice'] = site.select('div[@class="price"]').extract()
item['productURL'] = site.select('div[@class="image"]/a/@href').extract()
item['productTitle'] = site.select('div[@class="name"]/a/text()').extract()
item['productImage'] = site.select('div[@class="image"]/a/img/@src').extract()
item['productDesc'] = site.select('div[@class="description"]/text()').extract()
item['image_urls'] = item['productImage']
yield Request(item['productURL'][0],
meta={'item':item},
callback=self.product_detail_page)
def product_detail_page(self,response):
hxs=HtmlXpathSelector(response)
item=response.request.meta['item']
# add all images url's in item['image_urls']
yield item
Upvotes: 2