Reputation: 2360
According to (my reading of) the official dox here:
https://docs.djangoproject.com/en/dev/ref/models/querysets/#when-querysets-are-evaluated
a Django QuerySet should be cached when you evaluate it. But that doesn't seem to be the case. In the example that follows, TrackingImport is a model with a very large table behind it. (Output slightly edited for brevity.)
recs = TrackingImport.objects.filter(...stuff...)
In [102]: time(recs[0])
Wall time: 1.84 s
In [103]: time(recs[0])
Wall time: 1.84 s
Calling len() seems to work as advertised:
In [104]: len(recs)
Out[104]: 1823
In [105]: time(recs[0])
Wall time: 0.00 s
I don't get why dereferencing the array didn't cache the QuerySet results. It had to evaluate it, right? So what am I missing?
Upvotes: 2
Views: 1005
Reputation: 4467
You can go through the source code(django.db.model.query), then you'll be clear, here's django 1.3.4's query.py,
def __getitem__(self, k):
"""
Retrieves an item or slice from the set of results.
"""
if not isinstance(k, (slice, int, long)):
raise TypeError
assert ((not isinstance(k, slice) and (k >= 0))
or (isinstance(k, slice) and (k.start is None or k.start >= 0)
and (k.stop is None or k.stop >= 0))), \
"Negative indexing is not supported."
if self._result_cache is not None:
if self._iter is not None:
# The result cache has only been partially populated, so we may
# need to fill it out a bit more.
if isinstance(k, slice):
if k.stop is not None:
# Some people insist on passing in strings here.
bound = int(k.stop)
else:
bound = None
else:
bound = k + 1
if len(self._result_cache) < bound:
self._fill_cache(bound - len(self._result_cache))
return self._result_cache[k]
if isinstance(k, slice):
qs = self._clone()
if k.start is not None:
start = int(k.start)
else:
start = None
if k.stop is not None:
stop = int(k.stop)
else:
stop = None
qs.query.set_limits(start, stop)
return k.step and list(qs)[::k.step] or qs
try:
qs = self._clone()
qs.query.set_limits(k, k + 1)
return list(qs)[0]
except self.model.DoesNotExist, e:
raise IndexError(e.args)
When you not iterate through the query set, the _result_cache is None, then when you invoke resc[0], it will just skip to following lines,
try:
qs = self._clone()
qs.query.set_limits(k, k + 1)
return list(qs)[0]
except self.model.DoesNotExist, e:
raise IndexError(e.args)
You'll find that, in this case, the _result_cache is not being set. That's why the duration of multiple resc[0] costs same time.
After you invoke len(resc), you can find source code,
def __len__(self):
# Since __len__ is called quite frequently (for example, as part of
# list(qs), we make some effort here to be as efficient as possible
# whilst not messing up any existing iterators against the QuerySet.
if self._result_cache is None:
if self._iter:
self._result_cache = list(self._iter)
else:
self._result_cache = list(self.iterator())
elif self._iter:
self._result_cache.extend(self._iter)
return len(self._result_cache)
You can see the _result_cache has values, then you invoke recs[0], it will just use the cache,
if self._result_cache is not None:
....
return self._result_cache[k]
The souce code never lies, so it's better to read the souce code when you don't find your answer in documents.
Upvotes: 7