Starcat
Starcat

Reputation: 691

Django 2.1+ bulk update records with the count of their related records?

I'm trying to bulk update all of my records in table A with the count of their related records in table B.

I'd like to do something like:

from django.db.models import Subquery, OuterRef, Count


table_b_subquery = TableB.objects.filter(a_id=OuterRef('id'))

TableA.objects.all().update(table_b_count=Count(Subquery(table_b_subquery)))

That would be the equivalent of this NON-BULK method:

# Non-Bulk

for record in TableA.objects.all():
  record.table_b_count = record.table_b_set.count()

The error I get trying the bulk method is:

*** django.core.exceptions.FieldError: Aggregate functions are not allowed in this query

How do I do a seemingly simply count of related records in a bulk update? Ideally I'd like to apply a simple field filter to the count of table B as well.

Upvotes: 3

Views: 6279

Answers (2)

Starcat
Starcat

Reputation: 691

Endre's answer is what I'm looking for! I just found a new feature in Django 2.2 that might be worth using though and I'm conflicted. I'm not sure which is better.

Django 2.2 has bulk_update

https://docs.djangoproject.com/en/2.2/ref/models/querysets/#django.db.models.query.QuerySet.bulk_update

So with bulk_update for my question, I'd do:

records = []
for record in TableA.objects.all():
  counted_record = record.table_b_count = record.table_b_set.count()
  records.append(counted_record)

TableA.objects.bulk_update(records, ['table_b_count'], batch_size=100000) 

# This is acting on Table A with 1,000,000 rows and Table B with 5,000,000 rows.

Anyone have thoughts on whether Endre's answer or the new Django 2.2 method is better in this case, with millions of rows?

Upvotes: 3

Endre Both
Endre Both

Reputation: 5730

You need a subquery for the count (rather than just the countable objects), which is a bit of a hassle as regular aggregate queries like count() or aggregate() tend to execute immediately, whereas subqueries need to be lazy. That's why the annotate() workaround below is needed:

from django.db.models import Subquery, OuterRef, Count

table_b_subquery = Subquery(TableB.objects
    .filter(a_id=OuterRef('id'))
    .values('a_id')
    .annotate(cnt=Count('a_id'))
    .values('cnt')
)

TableA.objects.update(table_b_count=table_b_subquery)

Upvotes: 6

Related Questions