got it--thanks
got it--thanks

Reputation: 233

Speeding up a Django query

My company has a pretty complicated web of Django models that I don't deal too much with. But sometimes I need to do queries on it. One that I'm doing right now is taking an inconveniently long time. So because I'm not the best at understanding how to use annotations effectively and I don't really understand subqueries at all (and probably some other key Django stuff) I was hoping someone here could help figure out how to do a better job at getting this result quicker.

Here's a facsimile of the relevant models in our database.

class Company(models.Model):
    name = models.CharField(max_length=255)

    @property
    def active_humans(self):
        if hasattr(self, '_active_humans'):
            return self._active_humans
        else:
            self._active_humans = Human.objects.filter(active=True, departments__company=self).distinct()
            return self._active_humans

class Department(models.Model):
    name = models.CharField(max_length=225)
    company = models.ForeignKey(
        'muh_project.Company',
        related_name="departments",
        on_delete=models.PROTECT
    )
    humans = models.ManyToManyField('muh_project.Human', through='muh_project.Job', related_name='departments')

class Job(models.Model):
    name = models.CharField(max_length=225)
    department = models.ForeignKey(
        'muh_project.Department',
        on_delete=models.PROTECT
    )
    human = models.ForeignKey(
        'muh_project.Human',
        on_delete=models.PROTECT
    )

class Human(models.Model):
    active = models.BooleanField(default=True)

    @property
    def fixed_happy_dogs(self):
        return self.solutions.filter(is_neutered_spayed=True, disposition="happy")

class Dog(models.Model):
    is_neutered_spayed = models.BooleanField(default=True)
    disposition = models.CharField(max_length=225)
    age = models.IntegerField()

    human = models.ForeignKey(
        'muh_project.Human',
        related_name="dogs",
        on_delete=models.PROTECT
    )
    human_job = models.ForeignKey(
        'muh_project.Job',
        blank=True,
        null=True,
        on_delete=models.PROTECT
    )

What I'm trying to do (in the language of this silly toy example) is to get the number of humans with at least one of a certain type of dog for each of some companies. So what I'm doing is running this.

rows = []
company_type = "Tech"
fixed_happy_dogs = Dog.objects.filter(is_neutered_spayed=True, disposition="happy")
old_dogs = fixed_happy_dogs.filter(age__gte=7)
companies = Company.objects.filter(name__icontains=company_type)
for company in companies.order_by('id'):
    humans = company.active_humans
    num_humans = humans.distinct().count()
    humans_with_fixed_happy_dogs = humans.filter(dogs__in=fixed_happy_dogs).distinct().count()
    humans_with_old_dogs = humans.filter(dogs__in=old_dogs).distinct().count()
    rows.append(f'{company.id};{num_humans};{humans_with_fixed_happy_dogs};{humans_with_old_dogs}')

It generally takes anywhere from 45 - 120 seconds to run depending on how many companies I run it over. I'd like to cut that down. I do need the final result as a list of strings as shown.

Upvotes: 0

Views: 52

Answers (1)

yedpodtrzitko
yedpodtrzitko

Reputation: 9359

One low-hanging fruit would be to add db index to the column Dog.disposition, since it's being used in the .filter() statement, and it looks like it needs to do sequence scan over the table (each time it goes through the for loop).

For this task specifically I'd recommend to use Django Debug Toolbar where you can see all SQL queries, which can help you to pinpoint the slowest ones, and use EXPLAIN to see what goes wrong there.

Upvotes: 2

Related Questions