Optimise query in a model method

Question

I have a fairly simple model that's part of a double entry book keeping system. Double entry just means that each transaction (Journal Entry) is made up of multiple LineItems. The Lineitems should add up to zero to reflect the fact that money always comes out of one category (Ledger) and into another. The CR column is for money out, DR is money in (I think the CR and DR abreviations come from some Latin words and is standard naming convention in accounting systems).

My JournalEntry model has a method called is_valid() which checks that the line items balance and a few other checks. However the method is very database expensive, and when I use it to check many entries at once the database can't cope.

Any suggestions on how I can optimise the queries within this method to reduce database load?

class JournalEntry(models.Model):
    user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.PROTECT, null=True, blank=True)
    date = models.DateField(null=False, blank=False)
    # Make choiceset global so that it can be accessed in filters.py
    global JOURNALENRTY_TYPE_CHOICES
    JOURNALENRTY_TYPE_CHOICES = (
        ('BP', 'Bank Payment'),
        ('BR', 'Bank Receipt'),
        ('TR', 'Transfer'),
        ('JE', 'Journal Entry'),
        ('YE', 'Year End'),
    )
    type = models.CharField(
        max_length=2,
        choices=JOURNALENRTY_TYPE_CHOICES,
        blank=False,
        null=False,
        default='0'
        )
    description = models.CharField(max_length=255, null=True, blank=True)
    def __str__(self):
        if self.description:
            return self.description
        else:
            return 'Journal Entry '+str(self.id)

    @property
    def is_valid(self):
        """Checks if Journal Entry has valid data integrity"""
        # NEEDS TO BE OPTIMISED AS PERFORMANCE IS BAD
        cr = LineItem.objects.filter(journal_entry=self.id).aggregate(Sum('cr'))
        dr = LineItem.objects.filter(journal_entry=self.id).aggregate(Sum('dr'))
        if dr['dr__sum'] != cr['cr__sum']:
            return "Line items do not balance"
        if self.lineitem_set.filter(cr__isnull=True,dr__isnull=True).exists():
            return "Empty line item(s)"
        if self.lineitem_set.filter(cr__isnull=False,dr__isnull=False).exists():
            return "CR and DR vales present on same lineitem(s)"
        if (self.type=='BR' or self.type=='BP' or self.type=='TR') and len(self.lineitem_set.all()) != 2:
            return 'Incorrect number of line items'
        if len(self.lineitem_set.all()) == 0:
            return 'Has zero line items'
        return True

class LineItem(models.Model):
    journal_entry = models.ForeignKey(JournalEntry, on_delete=models.CASCADE)
    ledger = models.ForeignKey(Ledger, on_delete=models.PROTECT)
    description = models.CharField(max_length=255, null=True, blank=True)
    project = models.ForeignKey(Project, on_delete=models.SET_NULL, null=True, blank=True)
    cr = models.DecimalField(max_digits=8, decimal_places=2, null=True, blank=True)
    dr = models.DecimalField(max_digits=8, decimal_places=2, null=True, blank=True)
    reconciliation_date = models.DateField(null=True, blank=True)
    #def __str__(self):
    #    return self.description
    class Meta(object):
        ordering = ['id']

bruno desthuilliers · Accepted Answer

First thing first: if it's an expansive operation, it shouldn't be a property - not that it will change the execution time / db load, but at least it doesn't break the expectation that you're mostly doing a (relatively cheap) attribute access.

wrt/ possible optimisations, part of the cost is in the db roundtrip (including the time spent in the Python code - ORM and db adapter - itself) so a first thing would be to make as few queries as possible :

1/ replacing len(self.lineitem_set.all()) with self.lineitem_set.count() and avoiding calling it twice could save some time already

2/ you could probably regroup the first two queries in a single one (not tested...)

crdr = self.lineitem_set.aggregate(Sum('cr'), Sum('dr'))
if crdr['dr__sum'] != crdr['cr__sum']:
    return "Line items do not balance"

and well, that's about all the simple obvious optimisations, and I don't think it will really solve your issue.

Next step would probably be to try a stored procedure that would do all the validation process - one single roundtrip and possibly more room for db-level optimisations (depending on your db vendor).

Then - assuming your db schema, settings, server etc are fully optimized (which is a bit outside this site's on-topic policy) -, the only solution left is denormalization, either at the db-level (safer) or at django level using a local per-instance cache on your model - the issue being to make sure you properly invalidate this cache everytime anything that might affect it changes.

NB actually I'm a bit surprised your db "can't cope" with this, at it doesn't seem _that_ heavy - but it of course depends on how many lineitems per journal you have (on average and worst case) in your production data.

More infos about your choosen rbdms, setup (same server or distinct one, and if yes network connectivity between the servers, available RAM, rdbms settings, etc) could probably help too - even with the most optimized queries at the client level, there are limits to what your rdbms can do... but then this becomes more of sysadmin/dbadmin question

EDIT

Page load time is now long but it does complete. Yes 2000 records to list and execute the method on

You mean you're executing this in a view on a 2000+ records queryset ? Well I can well understand it's a bit heavy - and not only on the database FWIW.

I think you might be able to optimize this quite further for this use case then. First option would be to make use of the queryset's select_related, prefetch_related, annotate and extra features, and if it's not enough to go for raw sql.

Optimise query in a model method

Answers (1)

Related Questions