Jaaks
Jaaks

Reputation: 11

Cannot run 'scrapy crawl quotes'

Cannot get scrapy tutorial to work.

Am trying to learn scrapy but can't get even the tutorial to run. I have tried to run this in python 3.7 & 3.5.5 with the same results

import scrapy

class QuotesSpider(scrapy.Spider): name = "quotes"

def start_requests(self):
    urls = [
        'http://quotes.toscrape.com/page/1/',
        'http://quotes.toscrape.com/page/2/',
    ]
    for url in urls:
        yield scrapy.Request(url=url, callback=self.parse)

def parse(self, response):
    page = response.url.split("/")[-2]
    filename = 'quotes-%s.html' % page
    with open(filename, 'wb') as f:
        f.write(response.body)
    self.log('Saved file %s' % filename)

This appears to run OK. At least it throws no errors.

When I run "scrapy crawl quotes" in Anaconda prompt window, I get this:

"hed) C:\Users\userOne\python script files\scrapy\tutorial>scrapy crawl 
 quotes
 2019-01-23 18:34:27 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: 
 tutorial)
 2019-01-23 18:34:27 [scrapy.utils.log] INFO: Versions: lxml 4.2.3.0, libxml2 
 2.9.5, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 
 3.5.5 | packaged by conda-forge | (default, Jul 24 2018, 01:52:17) [MSC 
 v.1900 64 bit (AMD64)], pyOpenSSL 18.0.0 (OpenSSL 1.0.2p  14 Aug 2018), 
 cryptography 2.3.1, Platform Windows-10-10.0.17134-SP0
 Traceback (most recent call last):
   File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\spiderloader.py", line 69, in load
     return self._spiders[spider_name]
 KeyError: 'quotes'

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "C:\Users\userOne\Anaconda3\envs\hed\Scripts\scrapy-script.py", line 
 10, in <module>
     sys.exit(execute())
   File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\cmdline.py", line 150, in execute
     _run_print_help(parser, _run_command, cmd, args, opts)
   File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\cmdline.py", line 90, in _run_print_help
     func(*a, **kw)
   File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\cmdline.py", line 157, in _run_command
     cmd.run(args, opts)
   File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\commands\crawl.py", line 57, in run
     self.crawler_process.crawl(spname, **opts.spargs)
   File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\crawler.py", line 170, in crawl
     crawler = self.create_crawler(crawler_or_spidercls)
   File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\crawler.py", line 198, in create_crawler
     return self._create_crawler(crawler_or_spidercls)
   File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\crawler.py", line 202, in _create_crawler
     spidercls = self.spider_loader.load(spidercls)
   File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\spiderloader.py", line 71, in load
     raise KeyError("Spider not found: {}".format(spider_name))
 KeyError: 'Spider not found: quotes'

"

The output should be similar to this:

"016-12-16 21:24:05 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-12-16 21:24:05 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/2/> (referer: None)
2016-12-16 21:24:05 [quotes] DEBUG: Saved file quotes-1.html
2016-12-16 21:24:05 [quotes] DEBUG: Saved file quotes-2.html
2016-12-16 21:24:05 [scrapy.core.engine] INFO: Closing spider (finished)"

Thank in advance for any help you can give.

Upvotes: 1

Views: 3101

Answers (4)

Vikram Sarkar
Vikram Sarkar

Reputation: 1

You are missing the name variable.

def start_requests(self):
    name = "quotes"
    urls = [
        'http://quotes.toscrape.com/page/1/',
        'http://quotes.toscrape.com/page/2/',
    ]
    for url in urls:
        yield scrapy.Request(url=url, callback=self.parse)`name = "quotes"

Upvotes: 0

Jaskaran Singh
Jaskaran Singh

Reputation: 169

name is required and unique for every spider you create.

You can check this blog for getting started on Scrapy https://www.inkoop.io/blog/web-scraping-using-python-and-scrapy/

Upvotes: 0

Till Dzierzon
Till Dzierzon

Reputation: 11

I believe I found the answer. The tutorial doesn't mention one step that is only mentioned in your command line after you create the project via

scrapy startproject tutorial

The output for that command, besides creating your tutorial project, is

You can start your first spider with:
cd tutorial
scrapy genspider example example.com

For the tutorial to work, you need to enter

scrapy genspider quotes quotes.toscrape.com

Upvotes: 1

o4tuna
o4tuna

Reputation: 51

Perhaps your source code has been placed in the wrong directory?

I had a very similar, if not the same, problem. (I am not using Anaconda, but the error was also "line 69, in load return self._spiders[spider_name] KeyError: 'quotes'".

What fixed it for me was moving the source code file (quotes_spider.py) from the projectname/tutorial/tutorial directory to the projectname/tutorial/tutorial/spiders directory.

From the tutorial page . . . "This is the code for our first Spider. Save it in a file named quotes_spider.py under the tutorial/spiders directory in your project"

Upvotes: 3

Related Questions