Reputation: 177
I am new to scrapy and am attempting to figure out why I am able to extract the elements I need from the scrapy shell but not from the scrapy spider I created from command line.
In scrapy shell I did the following:
pipenv run scrapy shell http://quotes.toscrape.com/
Then
response.css('small.author::text').extract()
Which returns the following:
['Albert Einstein', 'J.K. Rowling', 'Albert Einstein', 'Jane Austen', 'Marilyn Monroe', 'Albert Einstein', 'André Gide', 'Thomas A. Edison', 'Eleanor Roosevelt', 'Steve Martin']
This is all as intended. But I start to have some issues when I create a scrapy spider and run it afterwards. My code is below:
# -*- coding: utf-8 -*-
import scrapy
class Yolo1Spider(scrapy.Spider):
name = 'yolo1'
allowed_domains = ['toscrape.com']
start_urls = ['http://http://quotes.toscrape.com/']
def parse(self, response):
self.log('Just visited' + response.url)
yield {
'author': response.css('small.author::text').extract()
}
I run the spider from the command line with:
pipenv run scrapy crawl yolo1
The errors I get are as follows:
2017-12-04 20:03:56 [yolo1] DEBUG: Just visitedhttp://www.dnsrsearch.com/index.php?origURL=http://http/quotes.toscrape.com/&bc= 2017-12-04 20:03:56 [scrapy.core.scraper] ERROR: Error processing {'author': []} Traceback (most recent call last): File "c:\users\alice.virtualenvs\all-the-places-c44chfla\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks current.result = callback(current.result, *args, **kw) File "C:\Users\alice\all-the-places\locations\pipelines.py", line 16, in process_item ref = item['ref'] KeyError: 'ref'
I get the feeling I am just missing something simple but for the life of me I cannot figure it out and have been checking all over the place.
You can see in the output of the spider crawl that the debug line I wrote printed out, but after that I get an error. Really thought I should be getting the same output from both the spider and the command line work I did.
Upvotes: 1
Views: 373
Reputation: 142681
You made mistake in start url - you have http://
twice.
See http://http://quotes.toscrape.com/
Upvotes: 1