akuiper
akuiper

Reputation: 214957

How to catch forbidden by robots.txt?

How can I catch a request that is forbidden by robots.txt in scrapy? Usually this seems to get ignored automatically, i.e. nothing in the output, so I really can't tell what happens to those urls. Ideally if crawling a url leads to this forbidden by robots.txt error, I'd like to output a record like {'url': url, 'status': 'forbidden by robots.txt'}. How can I do that?

New to scrapy. Appreciate any help.

Upvotes: 1

Views: 1477

Answers (1)

hckrman
hckrman

Reputation: 136

Go to settings.py in the project folder and change ROBOTSTXT_OBEY = True to ROBOTSTXT_OBEY = False.

Upvotes: 2

Related Questions