How to get response body in scrapy downloader middleware

Question

I need to be able to retry the request if certain xpaths were not found on the page. So I wrote this middleware:

class ManualRetryMiddleware(RetryMiddleware):
    def process_response(self, request, response, spider):
        if not spider.retry_if_not_found:
            return response
        if not hasattr(response, 'text') and response.status != 200:
            return super(ManualRetryMiddleware, self).process_response(request, response, spider)
        found = False
        for xpath in spider.retry_if_not_found:
            if response.xpath(xpath).extract():
                found = True
                break
        if not found:
            return self._retry(request, "Didn't find anything useful", spider)
        return response

And registered it in settings.py:

DOWNLOADER_MIDDLEWARES = {
    'myproject.middlewares.ManualRetryMiddleware': 650,
    'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
}

When I run the spider, I get

AttributeError: 'Response' object has no attribute 'xpath'

I tried to manually create selector and run xpath on it... But the response has no text property and response.body is bytes, not str...

So how can I check page content in middleware? It's possible that some pages won't contain details that I need, so I'd like to be able to try them again later.

Tom&#225;š Linhart · Accepted Answer

The reason response doesn't contain xpath method is that response parameter in process_response method of downloader middleware is of type scrapy.http.Response, see the documentation. Only scrapy.http.TextResponse (and scrapy.http.HtmlResponse) do have xpath method. So before using xpath, create HtmlResponse object from response. The corresponding part of your class would become:

...
new_response = scrapy.http.HtmlResponse(response.url, body=response.body)
if new_response.xpath(xpath).extract():
    found = True
    break
...

How to get response body in scrapy downloader middleware

Answers (2)

Related Questions