Reputation: 691
I'm trying to bulk update all of my records in table A with the count of their related records in table B.
I'd like to do something like:
from django.db.models import Subquery, OuterRef, Count
table_b_subquery = TableB.objects.filter(a_id=OuterRef('id'))
TableA.objects.all().update(table_b_count=Count(Subquery(table_b_subquery)))
That would be the equivalent of this NON-BULK method:
# Non-Bulk
for record in TableA.objects.all():
record.table_b_count = record.table_b_set.count()
The error I get trying the bulk method is:
*** django.core.exceptions.FieldError: Aggregate functions are not allowed in this query
How do I do a seemingly simply count of related records in a bulk update? Ideally I'd like to apply a simple field filter to the count of table B as well.
Upvotes: 3
Views: 6279
Reputation: 691
Endre's answer is what I'm looking for! I just found a new feature in Django 2.2 that might be worth using though and I'm conflicted. I'm not sure which is better.
Django 2.2 has bulk_update
So with bulk_update
for my question, I'd do:
records = []
for record in TableA.objects.all():
counted_record = record.table_b_count = record.table_b_set.count()
records.append(counted_record)
TableA.objects.bulk_update(records, ['table_b_count'], batch_size=100000)
# This is acting on Table A with 1,000,000 rows and Table B with 5,000,000 rows.
Anyone have thoughts on whether Endre's answer or the new Django 2.2 method is better in this case, with millions of rows?
Upvotes: 3
Reputation: 5730
You need a subquery for the count (rather than just the countable objects), which is a bit of a hassle as regular aggregate queries like count()
or aggregate()
tend to execute immediately, whereas subqueries need to be lazy. That's why the annotate()
workaround below is needed:
from django.db.models import Subquery, OuterRef, Count
table_b_subquery = Subquery(TableB.objects
.filter(a_id=OuterRef('id'))
.values('a_id')
.annotate(cnt=Count('a_id'))
.values('cnt')
)
TableA.objects.update(table_b_count=table_b_subquery)
Upvotes: 6