Reputation: 759
I tried to crawl a local HTML file stored in my desktop with the code below, but I encounter the following errors before crawling procedure, such as "No such file or directory: '/robots.txt'".
[Scrapy command]
$ scrapy crawl test -o test01.csv
[Scrapy spider]
class TestSpider(scrapy.Spider):
name = 'test'
allowed_domains = []
start_urls = ['file:///Users/Name/Desktop/test/test.html']
[Errors]
2018-11-16 01:57:52 [scrapy.core.engine] INFO: Spider opened
2018-11-16 01:57:52 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-16 01:57:52 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6024
2018-11-16 01:57:52 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET file:///robots.txt> (failed 1 times): [Errno 2] No such file or directory: '/robots.txt'
2018-11-16 01:57:56 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET file:///robots.txt> (failed 2 times): [Errno 2] No such file or directory: '/robots.txt'
Upvotes: 4
Views: 3493
Reputation: 26
To solve the error of "No such file or directory: '/robots.txt'" you can go to settings.py file and comment the line:
#ROBOTSTXT_OBEY = True
Upvotes: 0
Reputation: 56
When working with it locally, I never specify the allowed_domains
.
Try to take that line of code out and see if it works.
In your error its testing the 'empty' domain that you have given it.
Upvotes: 2