Reputation: 25
I have tried reproducing the Scrapy tutorial using Xpath and keep running into ERROR: Spider must return Request, BaseItem or None, got 'dict' in <GET http://quotes.toscrape.com/>
Not sure how to fix this.
I'm going to share snippets from two files which should be just enough for debugging:
1) My spider quotes_spider.py
from scrapy.spider import Spider
from scrapy import Request
class QuoteSpider(Spider):
name = 'quotes'
start_urls = [
'http://quotes.toscrape.com/',
]
def parse(self, response):
for quote in response.xpath('//div[@class="quote"]'):
yield {
'text': quote.xpath('.//span[@class="text"]/text()').extract(),
'author': quote.xpath('.//small[@class="author"]/text()').extract(),
'tags': quote.xpath('.//div[@class="tags"]/a[@class="tag"]/text()').extract(),
}
2) items.py
from scrapy.item import Item
class QuotesbotItem(Item):
text = scrapy.Field()
author = scrapy.Field()
tags = scrapy.Field()
FYI: In case you compare this to the tutorial and are wondering why I switched the extract_first()
to extract()
, it's because I was seeing another error exceptions.AttributeError: 'SelectorList' object has no attribute 'extract_first'
which is unrelated to this question I believe.
Upvotes: 2
Views: 4794
Reputation: 146610
You are returning a dictionary as the error says and not an Item
class QuoteSpider(Spider):
name = 'quotes'
start_urls = [
'http://quotes.toscrape.com/',
]
def parse(self, response):
for quote in response.xpath('//div[@class="quote"]'):
item = QuotesbotItem()
item['text'] = quote.xpath('.//span[@class="text"]/text()').extract()
item['author'] = quote.xpath('.//small[@class="author"]/text()').extract()
item['tags'] = quote.xpath('.//div[@class="tags"]/a[@class="tag"]/text()').extract()
yield item
Upvotes: 2