L.S
L.S

Reputation: 5

How to use the proxy automatically when a 403 status code is encountered by the scrapy?

In the process of grabbing, 403 status codes are encountered. The requirement is: automatically use the agent when 403 status code is encountered.

The check and response to the status code is configured in downloadermiddleware, but seems to work only for the first link

def process_response(self, request, response, spider):

    status_code = [403]
    if response.status in status_code:
        spider.logger.debug('Error ======= %s %s , 开始使用 Proxy 代理' % (response.status, request.url))
        import importlib
        settings = importlib.import_module('settings')
        proxy = AbuYunProxyMiddleware(settings=settings)
        request.meta['proxy'] = proxy.proxy_server
        request.headers['Proxy-Authorization'] = proxy.proxy_authorization
        return request
    return response

Upvotes: 0

Views: 107

Answers (1)

Kamoo
Kamoo

Reputation: 872

I recommend you create a new class inheriting scrapy's RetryMiddleware, and override the process_response function.

def process_response(self, request, response, spider):
    if request.meta.get('dont_retry', False):
        return response
    if response.status in self.retry_http_codes:
        reason = response_status_message(response.status)
        # Add your proxy
        request.meta["proxy"] = PROXY
        return self._retry(request, reason, spider) or response
    return response

Upvotes: 1

Related Questions