alioua walid
alioua walid

Reputation: 247

Enable to Run Scrapy Project

I'm quite new to Scrapy, I set up my project using this line "scrapy startproject tutorials" In terminal, I'm using Visual Studio Code.

I checked that:

  1. The name of my property to be exactly the one I'm calling.
  2. My scrapy.cfg is on the same path as my Script.
  3. I checked that SPIDER_MODULES and NEWSPIDER_MODULE are well written in [spiders > setting.py]

This is my code:

import scrapy

class QuoteSpider(scrapy.Spider):
    name = 'quotes'
    start_urls = [
        'http://quotes.toscrape.com/'
    ]

    def parse(self, response):
        title = response.css('title').extract()
        yield {'titleText' : title}

my settings.py

BOT_NAME = 'quotes'

SPIDER_MODULES = ['tutorials.spiders']
NEWSPIDER_MODULE = 'tutorials.spiders'

And this is how I'm running it:

scrapy crawl quotes

I'm still unable to run the crawler. What could be wrong ? Thanks.

Edit:

The error message I'm getting:

C:\Users\Mohamed\Desktop\python 1\test python\Solution Test - ALIOUA WALID\tutorials>scrapy crawl quotes
2020-02-26 09:48:35 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: quotes)
2020-02-26 09:48:35 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.2, w3lib 1.20.0, Twisted 19.10.0, Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b  26 Feb 2019), cryptography 2.6.1, Platform Windows-7-6.1.7601-SP1
Traceback (most recent call last):
  File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\spiderloader.py", line 69, in load
    return self._spiders[spider_name]
KeyError: 'quotes'

During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "c:\users\mohamed\appdata\local\programs\python\python36\lib\runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "c:\users\mohamed\appdata\local\programs\python\python36\lib\runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "C:\Users\Mohamed\AppData\Local\Programs\Python\Python36\Scripts\scrapy.exe\__main__.py", line 7, in <module>
      File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\cmdline.py", line 146, in execute
        _run_print_help(parser, _run_command, cmd, args, opts)
      File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\cmdline.py", line 100, in _run_print_help
        func(*a, **kw)
      File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\cmdline.py", line 154, in _run_command
        cmd.run(args, opts)
      File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\commands\crawl.py", line 57, in run
        self.crawler_process.crawl(spname, **opts.spargs)
      File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\crawler.py", line 183, in crawl
        crawler = self.create_crawler(crawler_or_spidercls)
      File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\crawler.py", line 216, in create_crawler
        return self._create_crawler(crawler_or_spidercls)
      File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\crawler.py", line 220, in _create_crawler
        spidercls = self.spider_loader.load(spidercls)
      File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\spiderloader.py", line 71, in load
        raise KeyError("Spider not found: {}".format(spider_name))
    KeyError: 'Spider not found: quotes'

Upvotes: 0

Views: 413

Answers (1)

adam-asdf
adam-asdf

Reputation: 656

I'm not a Windows user or a Python expert so I'm not going to try detailed debugging of your paths and such but with the code you have posted, even when you do get your paths fixed and spider generated, it's still not going to "crawl" a website because you don't have any mechanism for it to find and follow links to additional URLs to scrape.

When you write "crawl" I'm assuming you mean multiple pages, if you just want the one page, I'd expect you to use terms like "fetch" or "parse" (or fetch then parse).

As others noted, try genspider but also add the parameters for the crawl template...if memory serves it is something like scrapy genspider -t crawl quotes quotes.toscrape.com

That'll give you a spider template with built in callbacks for finding and crawling additional URLs.

Upvotes: 2

Related Questions