Reputation: 64719
How do you cache a paginated Django queryset, specifically in a ListView?
I noticed one query was taking a long time to run, so I'm attempting to cache it. The queryset is huge (over 100k records), so I'm attempting to only cache paginated subsections of it. I can't cache the entire view or template because there are sections that are user/session specific and need to change constantly.
ListView has a couple standard methods for retrieving the queryset, get_queryset()
, which returns the non-paginated data, and paginate_queryset()
, which filters it by the current page.
I first tried caching the query in get_queryset()
, but quickly realized calling cache.set(my_query_key, super(MyView, self).get_queryset())
was causing the entire query to be serialized.
So then I tried overriding paginate_queryset()
like:
import time
from functools import partial
from django.core.cache import cache
from django.views.generic import ListView
class MyView(ListView):
...
def paginate_queryset(self, queryset, page_size):
cache_key = 'myview-queryset-%s-%s' % (self.page, page_size)
print 'paginate_queryset.cache_key:',cache_key
t0 = time.time()
ret = cache.get(cache_key)
if ret is None:
print 're-caching'
ret = super(MyView, self).paginate_queryset(queryset, page_size)
cache.set(cache_key, ret, 60*60)
td = time.time() - t0
print 'paginate_queryset.time.seconds:',td
(paginator, page, object_list, other_pages) = ret
print 'total objects:',len(object_list)
return ret
However, this takes almost a minute to run, even though only 10 objects are retrieved, and every requests shows "re-caching", implying nothing is being saved to cache.
My settings.CACHE
looks like:
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
'LOCATION': '127.0.0.1:11211',
}
}
and service memcached status
shows memcached is running and tail -f /var/log/memcached.log
shows absolutely nothing.
What am I doing wrong? What is the proper way to cache a paginated query so that the entire queryset isn't retrieved?
Edit: I think their may be a bug in either memcached or the Python wrapper. Django appears to support two different memcached backends, one using python-memcached and one using pylibmc. The python-memcached seems to silently hide the error caching the paginate_queryset()
value. When I switched to the pylibmc backend, now I get an explicit error message "error 10 from memcached_set: SERVER ERROR" tracing back to django/core/cache/backends/memcached.py in set, line 78.
Upvotes: 5
Views: 7081
Reputation: 948
Here's an explanation of how to use the wonderful answer by Todor to cache pagination within ListView
. It is assumed that you have several ListView
in your app. Each of them requires its own distinct cache_key
. You add paginator_class = CachedPaginator
and overwrite the get_paginator
function by parent class.
from myapp.utils import CachedPaginator
class ModelAView(ListView):
model = ModelA
template_name = "model_a.html"
paginator_class = CachedPaginator # instead of default Paginator
paginate_by = 20
def get_paginator(
self, queryset, per_page, orphans=0, allow_empty_first_page=True, **kwargs
):
paginator_cache_key = "model_a_" + str(self.kwargs["model_a_pk"])
return self.paginator_class(
queryset,
per_page,
orphans=orphans,
allow_empty_first_page=allow_empty_first_page,
cache_key=paginator_cache_key,
**kwargs,
)
Upvotes: 0
Reputation: 41
I wanted to paginate my infinite scrolling view on my home page and this is the solution I came up with. It's a mix of Django CCBVs and the author's initial solution.
The response times, however, didn't improve as much as I would've hoped for but that's probably because I am testing it on my local with just 6 posts and 2 users haha.
# Import
from django.core.cache import cache
from django.core.paginator import InvalidPage
from django.views.generic.list import ListView
from django.http Http404
class MyListView(ListView):
template_name = 'MY TEMPLATE NAME'
model = MY POST MODEL
paginate_by = 10
def paginate_queryset(self, queryset, page_size):
"""Paginate the queryset"""
paginator = self.get_paginator(
queryset, page_size, orphans=self.get_paginate_orphans(),
allow_empty_first_page=self.get_allow_empty())
page_kwarg = self.page_kwarg
page = self.kwargs.get(page_kwarg) or self.request.GET.get(page_kwarg) or 1
try:
page_number = int(page)
except ValueError:
if page == 'last':
page_number = paginator.num_pages
else:
raise Http404(_("Page is not 'last', nor can it be converted to an int."))
try:
page = paginator.page(page_number)
cache_key = 'mylistview-%s-%s' % (page_number, page_size)
retreive_cache = cache.get(cache_key)
if retreive_cache is None:
print('re-caching')
retreive_cache = super(MyListView, self).paginate_queryset(queryset, page_size)
# Caching for 1 day
cache.set(cache_key, retreive_cache, 86400)
return retreive_cache
except InvalidPage as e:
raise Http404(_('Invalid page (%(page_number)s): %(message)s') % {
'page_number': page_number,
'message': str(e)
})
Upvotes: 0
Reputation: 16010
You can extend the Paginator
to support caching by a provided cache_key
.
A blog post about usage and implementation of a such CachedPaginator
can be found here. The source code is posted at djangosnippets.org (here is a web-acrhive link because the original is not working).
However I will post a slightly modificated example from the original version, which can not only cache objects per page, but the total count too. (sometimes even the count can be an expensive operation).
from django.core.cache import cache
from django.utils.functional import cached_property
from django.core.paginator import Paginator, Page, PageNotAnInteger
class CachedPaginator(Paginator):
"""A paginator that caches the results on a page by page basis."""
def __init__(self, object_list, per_page, orphans=0, allow_empty_first_page=True, cache_key=None, cache_timeout=300):
super(CachedPaginator, self).__init__(object_list, per_page, orphans, allow_empty_first_page)
self.cache_key = cache_key
self.cache_timeout = cache_timeout
@cached_property
def count(self):
"""
The original django.core.paginator.count attribute in Django1.8
is not writable and cant be setted manually, but we would like
to override it when loading data from cache. (instead of recalculating it).
So we make it writable via @cached_property.
"""
return super(CachedPaginator, self).count
def set_count(self, count):
"""
Override the paginator.count value (to prevent recalculation)
and clear num_pages and page_range which values depend on it.
"""
self.count = count
# if somehow we have stored .num_pages or .page_range (which are cached properties)
# this can lead to wrong page calculations (because they depend on paginator.count value)
# so we clear their values to force recalculations on next calls
try:
del self.num_pages
except AttributeError:
pass
try:
del self.page_range
except AttributeError:
pass
@cached_property
def num_pages(self):
"""This is not writable in Django1.8. We want to make it writable"""
return super(CachedPaginator, self).num_pages
@cached_property
def page_range(self):
"""This is not writable in Django1.8. We want to make it writable"""
return super(CachedPaginator, self).page_range
def page(self, number):
"""
Returns a Page object for the given 1-based page number.
This will attempt to pull the results out of the cache first, based on
the requested page number. If not found in the cache,
it will pull a fresh list and then cache that result + the total result count.
"""
if self.cache_key is None:
return super(CachedPaginator, self).page(number)
# In order to prevent counting the queryset
# we only validate that the provided number is integer
# The rest of the validation will happen when we fetch fresh data.
# so if the number is invalid, no cache will be setted
# number = self.validate_number(number)
try:
number = int(number)
except (TypeError, ValueError):
raise PageNotAnInteger('That page number is not an integer')
page_cache_key = "%s:%s:%s" % (self.cache_key, self.per_page, number)
page_data = cache.get(page_cache_key)
if page_data is None:
page = super(CachedPaginator, self).page(number)
#cache not only the objects, but the total count too.
page_data = (page.object_list, self.count)
cache.set(page_cache_key, page_data, self.cache_timeout)
else:
cached_object_list, cached_total_count = page_data
self.set_count(cached_total_count)
page = Page(cached_object_list, number, self)
return page
Upvotes: 5
Reputation: 64719
The problem turned out to be a combination of factors. Mainly, the result returned by the paginate_queryset()
contains a reference to the unlimited queryset, meaning it's essentially uncachable. When I called cache.set(mykey, (paginator, page, object_list, other_pages))
, it was trying to serialize thousands of records instead of just the page_size
number of records I was expecting, causing the cached item to exceed memcached's limits and fail.
The other factor was the horrible default error reporting in the memcached/python-memcached, which silently hides all errors and turns cache.set() into a nop if anything goes wrong, making it very time-consuming to track down the problem.
I fixed this by essentially rewriting paginate_queryset()
to ditch Django's builtin paginator functionality altogether and calculate the queryset myself with:
object_list = queryset[page_size*(page-1):page_size*(page-1)+page_size]
and then caching that object_list
.
Upvotes: 2