Baka
Baka

Reputation: 759

How to crawl local HTML file with Scrapy

I tried to crawl a local HTML file stored in my desktop with the code below, but I encounter the following errors before crawling procedure, such as "No such file or directory: '/robots.txt'".

[Scrapy command]

$ scrapy crawl test -o test01.csv

[Scrapy spider]

class TestSpider(scrapy.Spider):
    name = 'test'
    allowed_domains = []
    start_urls = ['file:///Users/Name/Desktop/test/test.html']

[Errors]

2018-11-16 01:57:52 [scrapy.core.engine] INFO: Spider opened
2018-11-16 01:57:52 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-16 01:57:52 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6024
2018-11-16 01:57:52 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET file:///robots.txt> (failed 1 times): [Errno 2] No such file or directory: '/robots.txt'
2018-11-16 01:57:56 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET file:///robots.txt> (failed 2 times): [Errno 2] No such file or directory: '/robots.txt'

Upvotes: 4

Views: 3493

Answers (2)

Vitor Hugo
Vitor Hugo

Reputation: 26

To solve the error of "No such file or directory: '/robots.txt'" you can go to settings.py file and comment the line:

#ROBOTSTXT_OBEY = True

Upvotes: 0

Japes
Japes

Reputation: 56

When working with it locally, I never specify the allowed_domains. Try to take that line of code out and see if it works.

In your error its testing the 'empty' domain that you have given it.

Upvotes: 2

Related Questions