How to catch forbidden by robots.txt?

Question

How can I catch a request that is forbidden by robots.txt in scrapy? Usually this seems to get ignored automatically, i.e. nothing in the output, so I really can't tell what happens to those urls. Ideally if crawling a url leads to this forbidden by robots.txt error, I'd like to output a record like {'url': url, 'status': 'forbidden by robots.txt'}. How can I do that?

New to scrapy. Appreciate any help.

How to catch forbidden by robots.txt?

Answers (1)

Related Questions