Mikko Vedru
Mikko Vedru

Reputation: 353

Python: Getting rid of nested for loops while accessing database (Models? View?)

I am trying to solve performance issues at a Django-based web site, with very little knowledge of Django and Python syntax. I seem to have correctly identified the problem. I also seem to know what to do next, but I can't get a grasp on the Python/Django syntax to make everything work.

class Company(models.Model):
    name = models.CharField(max_length=100)
    bic = models.CharField(max_length=100, blank=True)

    def get_order_count(self):
        return self.orders.count()

    def get_order_sum(self):
        total_sum = 0
        for contact in self.contacts.all():
            for order in contact.orders.all():
                total_sum += order.total
        return total_sum

class Contact(models.Model):
    company = models.ForeignKey(
        Company, related_name="contacts", on_delete=models.DO_NOTHING)
    first_name = models.CharField(max_length=100)
    last_name = models.CharField(max_length=100, blank=True)

    def get_order_count(self):
        return self.orders.count()

class Order(models.Model):
    order_number = models.CharField(max_length=100)
    company = models.ForeignKey(Company, related_name="orders", on_delete=models.DO_NOTHING)
    contact = models.ForeignKey(Contact, related_name="orders", on_delete=models.DO_NOTHING)
    total = models.DecimalField(max_digits=18, decimal_places=9)
    order_date = models.DateTimeField(null=True, blank=True)

    def __str__(self):
        return "%s" % self.order_number

My hunch is that the performance problems are caused by the nested loop in get_order_sum. And my solution is quite clear: nested "fors" should be replaced by a simple command, that uses aggregation and utilizes the database's own efficient internal SQL functionality. So in my mind, the solution should look something like this:

return self.contacts.all().orders.all().aggregate(Sum('total'))

The problem is that I can't figure out how to properly write what I want Django/Python to do. Please help me!

Or am I mistaken and is the problem (or part of it) in my View-code?

<table>
    <tr>
        <th>Name</th>
        <th>Order Count</th>
        <th>Order Sum</th>
        <th>Select</th>
    </tr>
    {% for company in company_list|slice:":100" %}
    <tr>
        <td>{{ company.name }}</td>
        <td>{{ company.orders.count }}</td>
        <td>{{ company.get_order_sum|floatformat:2 }}</td>
        <td><input type="checkbox" name="select{{company.pk}}" id=""></td>
    </tr>
    {% for contact in company.contacts.all %}
    <tr>
        <td>- {{ contact.first_name }} {{ contact.last_name }}</td>
        <td>Orders: {{ contact.orders.count }}</td>
        <td>&nbsp;</td>
        <td>&nbsp;</td>
    </tr>
    {% endfor %}
    {% endfor %}
</table>

I would also appreciate any other hints and opinions on how this code could be further improved (especially from the performance POV).

Upvotes: 1

Views: 164

Answers (2)

Endre Both
Endre Both

Reputation: 5730

The key is not to have excess database queries (this is not the same as having as few as possible) and, as you mentioned, to let the database do work it's good at.

In practical terms, a) all data you need should be in company_list by the time it gets to your template, so you don't have database queries sent from within the loops in your template, and b) company_list should be filled with data in an efficient way.

from django.db.models import Prefetch, Sum, Count

contacts_with_orders = Prefetch(
    'contacts',
    queryset=Contact.objects.annotate(order_count=Count('orders'))
)

company_list = (Company.objects
    .prefetch_related(contacts_with_orders)
    .annotate(order_sum=Sum('orders__total'),
              order_count=Count('orders'))
)

Now you can do your loops and access all required data without any further queries:

for company in company_list:
    company.name
    company.order_sum
    company.order_count
    for contact in company.contacts.all():
        contact.first_name
        contact.order_count

While this should be orders of magnitude faster than previously, it is still quite onerous. There is also some more room for optimization: If needed, you can shave off a bit more by querying only the columns you need instead of complete rows and by returning dictionaries instead of objects. See only(), values() and values_list() and the to_attr parameter in Prefetch.

Upvotes: 1

Ozgur Akcali
Ozgur Akcali

Reputation: 5492

As you guessed, your performance problem is most likely caused by get_order_sum method. You are running a query to get all contats of a company, then for each contact, you are running a query to get that contact's ordesr. You can find this sum with a single query, like this in Django:

from django.db.models import Sum
def get_order_sum(self):
    return self.contacts.aggregate(order_sum=Sum('orders__total')).get('order_sum')

Note that aggregate function returns a dictionary in the following format:

{
    'order_sum': 123
}

Upvotes: 1

Related Questions