Reputation: 3665
We have a pair of models that look (roughly) like this:
class Machine(models.Model):
machine_id = models.CharField(max_length=10)
# Other irrelevant fields
@property
def latest_update(self):
if self.machineupdate_set.count() == 0:
return None
return self.machineupdate_set.order_by('-update_time')[:1].get()
class MachineUpdate(models.Model):
machine = models.ForeignKey(Machine)
update_time = models.DateTimeField(auto_now_add=True)
# Other irrelevant fields
Whenever we load Machine
s from the database, we always end up using the latest_update
for that machine. When we first implemented this, we had a lot of machines and a fairly small number of updates per machine, so to improve performance (by reducing query count) we added a simple default prefetch to the model manager for Machine
:
class MachineManager(models.Manager):
def get_queryset(self):
return super(MachineManager, self).get_queryset().prefetch_related('machineupdate_set')
However, things have changed and now we have huge numbers of updates related to each machine, and the prefetch query is starting to become a problem (both in terms of long query execution time and memory consumption).
We are looking for a smarter way of pre-fetching the required data, as all we really need to pre-fetch is the latest update per machine, not all of them. Having looked at the Django prefetch_related docs it looked like we could change get_queryset
in our MachineManager
to something like this:
def get_queryset(self):
latest_update_query = MachineUpdate.objects.order_by('-update_time')[:1]
latest_update_prefetch = models.Prefetch('machineupdate_set', queryset=latest_update_query, to_attr='_latest_update')
return super(MachineManager, self).get_queryset().prefetch_related(latest_update_prefetch)
and then modify latest_update
to use the new attribute populated by the pre-fetch. However, this does not work because whenever we filter a Machine
query using this we get an error: AssertionError: Cannot filter a query once a slice has been taken.
Can anyone suggest a solution to this problem, so that we can efficiently load the latest_update
for each machine? We are unsure how to fix the issue we have with the above attempt at pre-fetching the latest updates.
(FYI - we have considered adding a is_latest_update
boolean field to MachineUpdate
which we can filter on, or alternatively a latest_update
foreign key reference on Machine
, however we want to avoid having to maintain this redundant information).
Upvotes: 4
Views: 2323
Reputation: 16050
I see that MachineUpdate.update_time
has auto_now_add=True
. So we can use Max(MachineUpdate.id)
per Machine
group to get the last MachineUpdate
. Right? If that's True
checkout the following code:
class MachineManager(models.Manager):
pass
class MachineQueryset(models.QuerySet):
def with_last_machineupdate(self):
return self.prefetch_related(models.Prefetch('machineupdate_set',
queryset=MachineUpdate.objects.filter(
id__in=Machine.objects \
.annotate(last_machineupdate_id=models.Max('machineupdate__id')) \
.values_list('last_machineupdate_id', flat=True) \
),
#notice the list word
to_attr='last_machineupdate_list'
))
class Machine(models.Model):
machine_id = models.CharField(max_length=10)
objects = MachineManager.from_queryset(MachineQueryset)()
@property
def latest_update(self):
if hasattr(self, 'last_machineupdate_list') and len(self.last_machineupdate_list) > 0:
return self.last_machineupdate_list[0]
return None
class MachineUpdate(models.Model):
machine = models.ForeignKey(Machine)
update_time = models.DateTimeField(auto_now_add=True)
def __unicode__(self):
return str(self.update_time)
Usage:
machines = Machine.objects.filter(...).with_last_machineupdate()
If that's not the case, e.g. we can't use Max('machineupdate__id')
and we need to stick with update_time
field. Then a slightly more optimized solution (but still getting all MachineUpdates
per Machine
) looks like this:
class MachineManager(models.Manager):
def get_queryset(self):
return super(MachineManager, self).get_queryset() \
.prefetch_related(models.Prefetch('machineupdate_set',
queryset=MachineUpdate.objects.order_by('-update_time')
))
class Machine(models.Model):
machine_id = models.CharField(max_length=10)
objects = MachineManager()
@property
def latest_update(self):
#this will not make queries
machine_updates = self.machineupdate_set.all()
if len(machine_updates) > 0:
return machine_updates[0]
return None
Upvotes: 5