Raisul Islam
Raisul Islam

Reputation: 329

How to check response status for http error codes using Scrapy?

I want to check the response status and export it to CSV file using Scrapy. I tried with response.status but it only shows '200' and exports to the CSV file. How to get other status codes like "404", "502" etc.

def parse(self, response):
        yield {
            'URL': response.url,
            'Status': response.status
        }

Upvotes: 1

Views: 719

Answers (2)

msenior_
msenior_

Reputation: 2120

You can add an errback to the request and then catch the http error in the errback function and yield the required information. Get more information about the errback function in the docs. See sample below

import scrapy
from scrapy.spidermiddlewares.httperror import HttpError


class TestSpider(scrapy.Spider):
    name = 'test'
    allowed_domains = ['example.com']

    def start_requests(self):
        yield scrapy.Request(url="https://example.com/error", errback=self.parse_error)

    def parse_error(self, failure):
        if failure.check(HttpError):
            # these exceptions come from HttpError spider middleware
            # you can get the non-200 response
            response = failure.value.response
            yield {
                'URL': response.url,
                'Status': response.status
            }

    def parse(self, response):
        yield {
            'URL': response.url,
            'Status': response.status
        }

Upvotes: 0

Alexander
Alexander

Reputation: 17355

In your settings you can adjust these to make sure certain error codes are not automatically filtered by scrapy.

HTTPERROR_ALLOWED_CODES

Default: []

Pass all responses with non-200 status codes contained in this list.

HTTPERROR_ALLOW_ALL

Default: False

Pass all responses, regardless of its status code.

settings.py


HTTPERROR_ALLOW_ALL = True

HTTPERROR_ALLOWED_CODES = [500, 501, 404 ...]

Upvotes: 1

Related Questions