Reputation: 233
My company has a pretty complicated web of Django models that I don't deal too much with. But sometimes I need to do queries on it. One that I'm doing right now is taking an inconveniently long time. So because I'm not the best at understanding how to use annotations effectively and I don't really understand subqueries at all (and probably some other key Django stuff) I was hoping someone here could help figure out how to do a better job at getting this result quicker.
Here's a facsimile of the relevant models in our database.
class Company(models.Model):
name = models.CharField(max_length=255)
@property
def active_humans(self):
if hasattr(self, '_active_humans'):
return self._active_humans
else:
self._active_humans = Human.objects.filter(active=True, departments__company=self).distinct()
return self._active_humans
class Department(models.Model):
name = models.CharField(max_length=225)
company = models.ForeignKey(
'muh_project.Company',
related_name="departments",
on_delete=models.PROTECT
)
humans = models.ManyToManyField('muh_project.Human', through='muh_project.Job', related_name='departments')
class Job(models.Model):
name = models.CharField(max_length=225)
department = models.ForeignKey(
'muh_project.Department',
on_delete=models.PROTECT
)
human = models.ForeignKey(
'muh_project.Human',
on_delete=models.PROTECT
)
class Human(models.Model):
active = models.BooleanField(default=True)
@property
def fixed_happy_dogs(self):
return self.solutions.filter(is_neutered_spayed=True, disposition="happy")
class Dog(models.Model):
is_neutered_spayed = models.BooleanField(default=True)
disposition = models.CharField(max_length=225)
age = models.IntegerField()
human = models.ForeignKey(
'muh_project.Human',
related_name="dogs",
on_delete=models.PROTECT
)
human_job = models.ForeignKey(
'muh_project.Job',
blank=True,
null=True,
on_delete=models.PROTECT
)
What I'm trying to do (in the language of this silly toy example) is to get the number of humans with at least one of a certain type of dog for each of some companies. So what I'm doing is running this.
rows = []
company_type = "Tech"
fixed_happy_dogs = Dog.objects.filter(is_neutered_spayed=True, disposition="happy")
old_dogs = fixed_happy_dogs.filter(age__gte=7)
companies = Company.objects.filter(name__icontains=company_type)
for company in companies.order_by('id'):
humans = company.active_humans
num_humans = humans.distinct().count()
humans_with_fixed_happy_dogs = humans.filter(dogs__in=fixed_happy_dogs).distinct().count()
humans_with_old_dogs = humans.filter(dogs__in=old_dogs).distinct().count()
rows.append(f'{company.id};{num_humans};{humans_with_fixed_happy_dogs};{humans_with_old_dogs}')
It generally takes anywhere from 45 - 120 seconds to run depending on how many companies I run it over. I'd like to cut that down. I do need the final result as a list of strings as shown.
Upvotes: 0
Views: 52
Reputation: 9359
One low-hanging fruit would be to add db index to the column Dog.disposition
, since it's being used in the .filter()
statement, and it looks like it needs to do sequence scan over the table (each time it goes through the for
loop).
For this task specifically I'd recommend to use Django Debug Toolbar where you can see all SQL queries, which can help you to pinpoint the slowest ones, and use EXPLAIN to see what goes wrong there.
Upvotes: 2