Prashant Prabhakar Singh
Prashant Prabhakar Singh

Reputation: 1190

Spider must return Request, BaseItem, dict or None, got 'set'

I am trying to download images of all products from here. My spider looks like:

from shopclues.items import ImgData
import scrapy

class multipleImages(scrapy.Spider):
     name='multipleImages'
     start_urls=['http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera',]

     def parse (self, response):
        for url in response.css('div.products-grid div.grid-product):
                    yield {
                    ImgData(image_urls=[url.css('img::attr(src)').extract()])
                    }

and items.py:

import scrapy
from scrapy.item import Item
class ShopcluesItem(scrapy.Item):
   # define the fields for your item here like:
   # name = scrapy.Field()
   pass

class ImgData(Item):
    image_urls=scrapy.Field()
    images=scrapy.Field()

But I get following error on running the spider:

2016-09-29 11:56:19 [scrapy] DEBUG: Crawled (200) <GET http://www.shopclues.com/robots.txt> (referer: None)
2016-09-29 11:56:20 [scrapy] DEBUG: Crawled (200) <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera> (referer: None)
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera>
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera>
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera>
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera>
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera>

What does this error means? What could be the possible reasons of the error?

Upvotes: 10

Views: 18311

Answers (2)

Rafael Almeida
Rafael Almeida

Reputation: 5240

Pass a list of urls to the pipeline.

 def parse (self, response):
     images = ImgData()
     images['image_urls']=[] 
     for url in response.css('div.products-grid div.grid-product'):
         images['image_urls'].append(url.css('img::attr(src)').extract_first())
     yield images

Upvotes: 7

Granitosaurus
Granitosaurus

Reputation: 21436

{} is notation to define a set in python, or a dictionary. Depends on the values you provide inside of the curly brackets. If it's a list {a,b,c,d} <- that's a set, if it's key to value {a:b, c:d} <- that's a dict.

You yield a set in this line:

yield {
    ImgData(image_urls=[url.css('img::attr(src)').extract()])
}

I assume you want to yield a dictionary?

yield {
    'images': ImgData(image_urls=[url.css('img::attr(src)').extract()]),
}

Upvotes: 6

Related Questions