Django - performance of using related model fields

Question

Suppose I have two related models

class Foo(models.Model):
    value = models.FloatField()

class Bar(models.Model):
    multiplier = models.FloatField()
    foo = models.ForeignKey(Foo, related_name="bars")

    def multiply(self):
        return self.foo.value * self.multiplier

An instance of Foo will frequently have many instances of Bar, but some information that is relevant for a calculation that Bar does is stored in Foo (because it is the same for all instances of related Bars)

The problem is when I do something like this:

foo = Foo.objects.latest()
[x.multiply() for x in foo.bars.all()]

It ends up hitting the database a lot because every Bar object in foo.bars.all() queries the database for the Foo object. So, if I have 10 Bars, then I will incur 11 database queries (1 to get the queryset with 10 bars, and 1 for each Bar object reaching back to get self.foo.value). Using select_related() doesn't seem to help.

My questions are: 1) Am I correct in thinking that memcached (e.g. Johnny Cache, Cache Machine) will solve this problem? 2) Is there a way of designing the object relationship that can make the command more efficient without a cache?

voithos · Accepted Answer

It is precisely this kind of situation for which select_related and prefetch_related were created. When you query using these, Django's ORM will employ one of two techniques to avoid redundant database requests: following relations via JOINs (select_related) or pre-caching one-to-many / many-to-many relations in their QuerySets.

# Hits the database
foo = Foo.objects.prefetch_related('bars').latest()

# Doesn't hit the database
[x.value for x in foo.bars.all()]

Django - performance of using related model fields

Answers (1)

Related Questions