Reputation: 329
I want to check the response status and export it to CSV file using Scrapy. I tried with response.status
but it only shows '200' and exports to the CSV file. How to get other status codes like "404", "502" etc.
def parse(self, response):
yield {
'URL': response.url,
'Status': response.status
}
Upvotes: 1
Views: 719
Reputation: 2120
You can add an errback
to the request and then catch the http error in the errback
function and yield the required information. Get more information about the errback function in the docs. See sample below
import scrapy
from scrapy.spidermiddlewares.httperror import HttpError
class TestSpider(scrapy.Spider):
name = 'test'
allowed_domains = ['example.com']
def start_requests(self):
yield scrapy.Request(url="https://example.com/error", errback=self.parse_error)
def parse_error(self, failure):
if failure.check(HttpError):
# these exceptions come from HttpError spider middleware
# you can get the non-200 response
response = failure.value.response
yield {
'URL': response.url,
'Status': response.status
}
def parse(self, response):
yield {
'URL': response.url,
'Status': response.status
}
Upvotes: 0
Reputation: 17355
In your settings you can adjust these to make sure certain error codes are not automatically filtered by scrapy.
Default: []
Pass all responses with non-200 status codes contained in this list.
Default: False
Pass all responses, regardless of its status code.
settings.py
HTTPERROR_ALLOW_ALL = True
HTTPERROR_ALLOWED_CODES = [500, 501, 404 ...]
Upvotes: 1