Reputation: 378
I can not find a solution to the following problem. I am using Scrapy (latest version) and am trying to debug a spider.
Using scrapy shell https://jigsaw.w3.org/HTTP/300/301.html
-> it does not follow the redirect ( it is using a default spider to get the data). If I am running my spider it follows the 301 - but I can not debug.
How can you make the shell to follow the 301 to allow one to debug the final page?
Upvotes: 4
Views: 2086
Reputation: 21406
Scrapy uses Redirect Middleware for redirects, however it's not enabled in shell. Quick fix for this:
scrapy shell "https://jigsaw.w3.org/HTTP/300/301.html"
fetch(response.headers['Location'])
Also to debug your spider you probably want to inspect the response your spider is receiving:
from scrapy.shell import inspect_response
def parse(self, response)
inspect_response(response, self)
# the spider will stop here and open up an interactive shell during the run
Upvotes: 10