Tim
Tim

Reputation: 2202

Python Scrapy tutorial KeyError: 'Spider not found:

I'm trying to write my first scrapy spider, Ive been following the tutorial at http://doc.scrapy.org/en/latest/intro/tutorial.html But I'm getting an error "KeyError: 'Spider not found: "

I think I'm running the command from the correct directory (the one with the scrapy.cfg file)

(proscraper)#( 10/14/14@ 2:06pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   tree
.
├── scrapy
│   ├── __init__.py
│   ├── items.py
│   ├── pipelines.py
│   ├── settings.py
│   └── spiders
│       ├── __init__.py
│       └── juno_spider.py
└── scrapy.cfg

2 directories, 7 files
(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   ls
scrapy  scrapy.cfg

Here is the error I'm getting

(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   scrapy crawl juno
/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/twisted/internet/_sslverify.py:184: UserWarning: You do not have the service_identity module installed. Please install it from <https://pypi.python.org/pypi/service_identity>. Without the service_identity module and a recent enough pyOpenSSL tosupport it, Twisted can perform only rudimentary TLS client hostnameverification.  Many valid certificate/hostname mappings may be rejected.
  verifyHostname, VerificationError = _selectVerifyImplementation()
Traceback (most recent call last):
  File "/home/tim/.virtualenvs/proscraper/bin/scrapy", line 9, in <module>
    load_entry_point('Scrapy==0.24.4', 'console_scripts', 'scrapy')()
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 143, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
    func(*a, **kw)
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command
    cmd.run(args, opts)
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/commands/crawl.py", line 58, in run
    spider = crawler.spiders.create(spname, **opts.spargs)
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/spidermanager.py", line 44, in create
    raise KeyError("Spider not found: %s" % spider_name)
KeyError: 'Spider not found: juno'

This is my virtualenv:

(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   pip freeze
Scrapy==0.24.4
Twisted==14.0.2
cffi==0.8.6
cryptography==0.6
cssselect==0.9.1
ipdb==0.8
ipython==2.3.0
lxml==3.4.0
pyOpenSSL==0.14
pycparser==2.10
queuelib==1.2.2
six==1.8.0
w3lib==1.10.0
wsgiref==0.1.2
zope.interface==4.1.1

Here is the code for my spider wth the name attribute filled in:

(proscraper)#( 10/14/14@ 2:14pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   cat scrapy/spiders/juno_spider.py 
import scrapy

class JunoSpider(scrapy.Spider):
    name = "juno"
    allowed_domains = ["http://www.juno.co.uk/"]
    start_urls = [
        "http://www.juno.co.uk/dj-equipment/"
    ]

    def parse(self, response):
        filename = response.url.split("/")[-2]
        with open(filename, 'wb') as f:
            f.write(response.body)

Upvotes: 7

Views: 10765

Answers (1)

dreyescat
dreyescat

Reputation: 13798

When you start a project with scrapy as the project name it creates the directory structure you printed:

.
├── scrapy
│   ├── __init__.py
│   ├── items.py
│   ├── pipelines.py
│   ├── settings.py
│   └── spiders
│       ├── __init__.py
│       └── juno_spider.py
└── scrapy.cfg

But using scrapy as the project name has a collateral effect. If you open the generated scrapy.cfg you will see that your default settings points to your scrapy.settings module.

[settings]
default = scrapy.settings

When we cat the scrapy.settings file we see:

BOT_NAME = 'scrapy'

SPIDER_MODULES = ['scrapy.spiders']
NEWSPIDER_MODULE = 'scrapy.spiders'

Well, nothing strange here. The bot name, the list of modules where Scrapy will look for spiders, and the module where to create new spiders using the genspider command. So far, so good.

Now let's check the scrapy library. It has been properly installed under your proscraper isolated virtualenv under the /home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy directory. Remember that site-packages is always added to the sys.path, that contains all the paths from where Python is going to search for the modules. So, guess what... the scrapy library also has a settings module /home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/settings that imports /home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/settings/default_settings.py that holds the default values for all the settings. Special attention to the default SPIDER_MODULES entry:

SPIDER_MODULES = []

Maybe you are starting to get what is happening. Choosing scrapy as the project name also generated a scrapy.settings module that clashes with the scrapy library scrapy.settings. And here is where the order in how the corresponding paths were inserted in sys.path will make Python to import one or the other. First to appear wins. In this case the scrapy library settings wins. And hence the KeyError: 'Spider not found: juno'.

To solve this conflict you could rename your project folder to another name, let's say scrap:

.
├── scrap
│   ├── __init__.py

Modify your scrapy.cfg to point to the proper settings module:

[settings]
default = scrap.settings

And update your scrap.settings to point to the proper spiders:

SPIDER_MODULES = ['scrap.spiders']

But as @paultrmbrth suggested I would recreate the project with another name.

Upvotes: 10

Related Questions