Reputation: 1190
I am trying to download images of all products from here. My spider looks like:
from shopclues.items import ImgData
import scrapy
class multipleImages(scrapy.Spider):
name='multipleImages'
start_urls=['http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera',]
def parse (self, response):
for url in response.css('div.products-grid div.grid-product):
yield {
ImgData(image_urls=[url.css('img::attr(src)').extract()])
}
and items.py:
import scrapy
from scrapy.item import Item
class ShopcluesItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
pass
class ImgData(Item):
image_urls=scrapy.Field()
images=scrapy.Field()
But I get following error on running the spider:
2016-09-29 11:56:19 [scrapy] DEBUG: Crawled (200) <GET http://www.shopclues.com/robots.txt> (referer: None)
2016-09-29 11:56:20 [scrapy] DEBUG: Crawled (200) <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera> (referer: None)
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera>
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera>
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera>
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera>
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera>
What does this error means? What could be the possible reasons of the error?
Upvotes: 10
Views: 18311
Reputation: 5240
Pass a list of urls to the pipeline.
def parse (self, response):
images = ImgData()
images['image_urls']=[]
for url in response.css('div.products-grid div.grid-product'):
images['image_urls'].append(url.css('img::attr(src)').extract_first())
yield images
Upvotes: 7
Reputation: 21436
{}
is notation to define a set in python, or a dictionary. Depends on the values you provide inside of the curly brackets. If it's a list {a,b,c,d} <- that's a set, if it's key to value {a:b, c:d} <- that's a dict.
You yield a set in this line:
yield {
ImgData(image_urls=[url.css('img::attr(src)').extract()])
}
I assume you want to yield a dictionary?
yield {
'images': ImgData(image_urls=[url.css('img::attr(src)').extract()]),
}
Upvotes: 6