Sean_Boothby
Sean_Boothby

Reputation: 177

Different output from scrapy spider than from scrapy shell

I am new to scrapy and am attempting to figure out why I am able to extract the elements I need from the scrapy shell but not from the scrapy spider I created from command line.

In scrapy shell I did the following:

pipenv run scrapy shell http://quotes.toscrape.com/

Then

response.css('small.author::text').extract()

Which returns the following:

['Albert Einstein', 'J.K. Rowling', 'Albert Einstein', 'Jane Austen', 'Marilyn Monroe', 'Albert Einstein', 'André Gide', 'Thomas A. Edison', 'Eleanor Roosevelt', 'Steve Martin']

This is all as intended. But I start to have some issues when I create a scrapy spider and run it afterwards. My code is below:

# -*- coding: utf-8 -*-
import scrapy

class Yolo1Spider(scrapy.Spider):
    name = 'yolo1'
    allowed_domains = ['toscrape.com']
    start_urls = ['http://http://quotes.toscrape.com/']

    def parse(self, response):
        self.log('Just visited' + response.url)
        yield {
            'author': response.css('small.author::text').extract()
            }

I run the spider from the command line with:

pipenv run scrapy crawl yolo1

The errors I get are as follows:

2017-12-04 20:03:56 [yolo1] DEBUG: Just visitedhttp://www.dnsrsearch.com/index.php?origURL=http://http/quotes.toscrape.com/&bc= 2017-12-04 20:03:56 [scrapy.core.scraper] ERROR: Error processing {'author': []} Traceback (most recent call last): File "c:\users\alice.virtualenvs\all-the-places-c44chfla\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks current.result = callback(current.result, *args, **kw) File "C:\Users\alice\all-the-places\locations\pipelines.py", line 16, in process_item ref = item['ref'] KeyError: 'ref'

I get the feeling I am just missing something simple but for the life of me I cannot figure it out and have been checking all over the place.

You can see in the output of the spider crawl that the debug line I wrote printed out, but after that I get an error. Really thought I should be getting the same output from both the spider and the command line work I did.

Upvotes: 1

Views: 373

Answers (1)

furas
furas

Reputation: 142681

You made mistake in start url - you have http:// twice.

See http://http://quotes.toscrape.com/

Upvotes: 1

Related Questions