Reputation: 11
Cannot get scrapy tutorial to work.
Am trying to learn scrapy but can't get even the tutorial to run. I have tried to run this in python 3.7 & 3.5.5 with the same results
import scrapy
class QuotesSpider(scrapy.Spider): name = "quotes"
def start_requests(self):
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
page = response.url.split("/")[-2]
filename = 'quotes-%s.html' % page
with open(filename, 'wb') as f:
f.write(response.body)
self.log('Saved file %s' % filename)
This appears to run OK. At least it throws no errors.
When I run "scrapy crawl quotes" in Anaconda prompt window, I get this:
"hed) C:\Users\userOne\python script files\scrapy\tutorial>scrapy crawl
quotes
2019-01-23 18:34:27 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot:
tutorial)
2019-01-23 18:34:27 [scrapy.utils.log] INFO: Versions: lxml 4.2.3.0, libxml2
2.9.5, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python
3.5.5 | packaged by conda-forge | (default, Jul 24 2018, 01:52:17) [MSC
v.1900 64 bit (AMD64)], pyOpenSSL 18.0.0 (OpenSSL 1.0.2p 14 Aug 2018),
cryptography 2.3.1, Platform Windows-10-10.0.17134-SP0
Traceback (most recent call last):
File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\spiderloader.py", line 69, in load
return self._spiders[spider_name]
KeyError: 'quotes'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\userOne\Anaconda3\envs\hed\Scripts\scrapy-script.py", line
10, in <module>
sys.exit(execute())
File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\cmdline.py", line 150, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\cmdline.py", line 90, in _run_print_help
func(*a, **kw)
File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\cmdline.py", line 157, in _run_command
cmd.run(args, opts)
File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\commands\crawl.py", line 57, in run
self.crawler_process.crawl(spname, **opts.spargs)
File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\crawler.py", line 170, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\crawler.py", line 198, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\crawler.py", line 202, in _create_crawler
spidercls = self.spider_loader.load(spidercls)
File "C:\Users\userOne\Anaconda3\envs\hed\lib\site- packages\scrapy\spiderloader.py", line 71, in load
raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: quotes'
"
The output should be similar to this:
"016-12-16 21:24:05 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-12-16 21:24:05 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/2/> (referer: None)
2016-12-16 21:24:05 [quotes] DEBUG: Saved file quotes-1.html
2016-12-16 21:24:05 [quotes] DEBUG: Saved file quotes-2.html
2016-12-16 21:24:05 [scrapy.core.engine] INFO: Closing spider (finished)"
Thank in advance for any help you can give.
Upvotes: 1
Views: 3101
Reputation: 1
You are missing the name variable.
def start_requests(self):
name = "quotes"
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)`name = "quotes"
Upvotes: 0
Reputation: 169
name
is required and unique for every spider you create.
You can check this blog for getting started on Scrapy https://www.inkoop.io/blog/web-scraping-using-python-and-scrapy/
Upvotes: 0
Reputation: 11
I believe I found the answer. The tutorial doesn't mention one step that is only mentioned in your command line after you create the project via
scrapy startproject tutorial
The output for that command, besides creating your tutorial project, is
You can start your first spider with:
cd tutorial
scrapy genspider example example.com
For the tutorial to work, you need to enter
scrapy genspider quotes quotes.toscrape.com
Upvotes: 1
Reputation: 51
Perhaps your source code has been placed in the wrong directory?
I had a very similar, if not the same, problem. (I am not using Anaconda, but the error was also "line 69, in load return self._spiders[spider_name] KeyError: 'quotes'".
What fixed it for me was moving the source code file (quotes_spider.py) from the projectname/tutorial/tutorial directory to the projectname/tutorial/tutorial/spiders directory.
From the tutorial page . . . "This is the code for our first Spider. Save it in a file named quotes_spider.py under the tutorial/spiders directory in your project"
Upvotes: 3