Reputation: 10463
I am using scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware
to cache scrapy requests. I'd like it to only cache if status is 200. Is that the default behavior? Or do I need to specify HTTPCACHE_IGNORE_HTTP_CODES
to be everything except 200?
Upvotes: 3
Views: 1030
Reputation: 1135
The answer is no, you do not need to do that. You should write a CachePolicy and update settings.py to enable your policy I put the CachePolicy class in the middlewares.py
from scrapy.extensions.httpcache import DummyPolicy
class CachePolicy(DummyPolicy):
def should_cache_response(self, response, request):
return response.status == 200
and then update the settings.py, append the following line
HTTPCACHE_POLICY = 'yourproject.middlewares.CachePolicy'
Upvotes: 3
Reputation: 21436
Yes, by default HttpCacheMiddleware
run a DummyPolicy
for the requests. It pretty much does nothing special on it's own so you need to set HTTPCACHE_IGNORE_HTTP_CODES
to everything except 200.
Here's the source for the DummyPolicy And these are the lines that actually matter:
class DummyPolicy(object):
def __init__(self, settings):
self.ignore_http_codes = [int(x) for x in settings.getlist('HTTPCACHE_IGNORE_HTTP_CODES')]
def should_cache_response(self, response, request):
return response.status not in self.ignore_http_codes
So in reality you can also extend this and override should_cache_response()
to something that would check for 200
explicitly, i.e. return response.status == 200
and then set it as your cache policy via HTTPCACHE_POLICY
setting.
Upvotes: 0