Reputation: 5
In the process of grabbing, 403 status codes are encountered. The requirement is: automatically use the agent when 403 status code is encountered.
The check and response to the status code is configured in downloadermiddleware, but seems to work only for the first link
def process_response(self, request, response, spider):
status_code = [403]
if response.status in status_code:
spider.logger.debug('Error ======= %s %s , 开始使用 Proxy 代理' % (response.status, request.url))
import importlib
settings = importlib.import_module('settings')
proxy = AbuYunProxyMiddleware(settings=settings)
request.meta['proxy'] = proxy.proxy_server
request.headers['Proxy-Authorization'] = proxy.proxy_authorization
return request
return response
Upvotes: 0
Views: 107
Reputation: 872
I recommend you create a new class inheriting scrapy
's RetryMiddleware
, and override the process_response
function.
def process_response(self, request, response, spider):
if request.meta.get('dont_retry', False):
return response
if response.status in self.retry_http_codes:
reason = response_status_message(response.status)
# Add your proxy
request.meta["proxy"] = PROXY
return self._retry(request, reason, spider) or response
return response
Upvotes: 1