Reputation: 247
I'm quite new to Scrapy, I set up my project using this line "scrapy startproject tutorials"
In terminal, I'm using Visual Studio Code.
I checked that:
scrapy.cfg
is on the same path as my Script. This is my code:
import scrapy
class QuoteSpider(scrapy.Spider):
name = 'quotes'
start_urls = [
'http://quotes.toscrape.com/'
]
def parse(self, response):
title = response.css('title').extract()
yield {'titleText' : title}
my settings.py
BOT_NAME = 'quotes'
SPIDER_MODULES = ['tutorials.spiders']
NEWSPIDER_MODULE = 'tutorials.spiders'
And this is how I'm running it:
scrapy crawl quotes
I'm still unable to run the crawler. What could be wrong ? Thanks.
Edit:
The error message I'm getting:
C:\Users\Mohamed\Desktop\python 1\test python\Solution Test - ALIOUA WALID\tutorials>scrapy crawl quotes
2020-02-26 09:48:35 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: quotes)
2020-02-26 09:48:35 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.2, w3lib 1.20.0, Twisted 19.10.0, Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b 26 Feb 2019), cryptography 2.6.1, Platform Windows-7-6.1.7601-SP1
Traceback (most recent call last):
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\spiderloader.py", line 69, in load
return self._spiders[spider_name]
KeyError: 'quotes'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\Mohamed\AppData\Local\Programs\Python\Python36\Scripts\scrapy.exe\__main__.py", line 7, in <module>
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\cmdline.py", line 146, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\cmdline.py", line 100, in _run_print_help
func(*a, **kw)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\cmdline.py", line 154, in _run_command
cmd.run(args, opts)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\commands\crawl.py", line 57, in run
self.crawler_process.crawl(spname, **opts.spargs)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\crawler.py", line 183, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\crawler.py", line 216, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\crawler.py", line 220, in _create_crawler
spidercls = self.spider_loader.load(spidercls)
File "c:\users\mohamed\appdata\local\programs\python\python36\lib\site-packages\scrapy\spiderloader.py", line 71, in load
raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: quotes'
Upvotes: 0
Views: 413
Reputation: 656
I'm not a Windows user or a Python expert so I'm not going to try detailed debugging of your paths and such but with the code you have posted, even when you do get your paths fixed and spider generated, it's still not going to "crawl" a website because you don't have any mechanism for it to find and follow links to additional URLs to scrape.
When you write "crawl" I'm assuming you mean multiple pages, if you just want the one page, I'd expect you to use terms like "fetch" or "parse" (or fetch then parse).
As others noted, try genspider
but also add the parameters for the crawl template...if memory serves it is something like scrapy genspider -t crawl quotes quotes.toscrape.com
That'll give you a spider template with built in callbacks for finding and crawling additional URLs.
Upvotes: 2