Reputation: 214957
How can I catch a request that is forbidden by robots.txt in scrapy? Usually this seems to get ignored automatically, i.e. nothing in the output, so I really can't tell what happens to those urls. Ideally if crawling a url leads to this forbidden by robots.txt error, I'd like to output a record like {'url': url, 'status': 'forbidden by robots.txt'}
. How can I do that?
New to scrapy. Appreciate any help.
Upvotes: 1
Views: 1477
Reputation: 136
Go to settings.py
in the project folder and change ROBOTSTXT_OBEY = True
to ROBOTSTXT_OBEY = False
.
Upvotes: 2