Reputation: 191
I'm annotating a queryset like so:
class ItemQuerySet(models.QuerySet):
def annotate_subitem_stats(self):
return self.annotate(
count_subitems=Count('subitems'),
has_sent_subitems=Case(
When(subitems__status=Status.sent, then=Value(True)),
default=Value(False)
),
)
In this example, SubItem
is a model with a foreign key to Item
.
A strange behaviour happens when I run this code. Suppose we have 1 Item
and 2 SubItem
s linked to it. One subitem has status sent and the other doesn't. When I run annotate on the queryset, the queryset returns the item twice, one with has_sent_subitems
set to True
and the other set to False
. The other strange thing is that, one duplicate has count_subitems == 1
, and the other has count_subitems == 1
, as if the queryset has split item into two rows, one where status == 'sent'
and the other where status != 'sent'
.
This is basically what the annotated queryset looks like:
[
{
'name': 'Item Name',
'count_subitems': 1,
'has_sent_subitem': False
},
{
'name': 'Item Name',
'count_subitems': 1,
'has_sent_subitem': True
}
]
This is what the database looks like, using pseudocode:
item = Item()
SubItem(item=item, status=draft)
SubItem(item=item, status=sent)
I'm pretty sure this has to do with line When(subitems__status=Status.sent, then=Value(True)),
. Is there any way I can make that line check if only 1 item has status sent, and then set the annotation to true and move on?
P.S. Using .distinct()
didn't work. I can't use .distinct(field)
, because annotate() + distinct(fields) is not implemented.
Upvotes: 4
Views: 2828
Reputation: 2797
For some extra context: I think this is a known and documented limitation (though the docs are easy to miss). The docs on aggreation have this section (copied from Django 4.2 docs):
Combining multiple aggregations
Combining multiple aggregations with annotate() will yield the wrong results because joins are used instead of subqueries:
>>> book = Book.objects.first() >>> book.authors.count() 2 >>> book.store_set.count() 3 >>> q = Book.objects.annotate(Count('authors'), Count('store')) >>> q[0].authors__count 6 >>> q[0].store__count 6
For most aggregates, there is no way to avoid this problem, however, the Count aggregate has a distinct parameter that may help:
>>> q = Book.objects.annotate(Count('authors', distinct=True), Count('store', distinct=True)) >>> q[0].authors__count 2 >>> q[0].store__count 3
Another way to circumvent the limitation is to put the aggregate in a subquery. I previously explained how to use the django-sql-utils package for this:
For anyone else running into this, a reasonable workaround seems to be to use subqueries for aggregating annotations. This is a bit verbose/hacky in Django currently, as shown by the multitude of approaches in the stackoverflow link from Antoine's comment. However, I've successfully used the django-sql-utils package for this just now. That sounds a bit bulky, but it just has two utilities, one of which is a SubqueryAggregate class (with derived SubqueryCount, SubquerySum, etc.) that make converting a regular joining aggregate into a subquery aggregate easy and concise, without changing the structure much.
For example taking the example from comment 66 and converting the second annotation to a subquery with django-sql-utils, you'd get:
Branch.objects.annotate( total=Sum('center__client__loan__amount'), repaid=SubquerySum('center__client__loan__payment_schedule__payments__principal'), )
There is also a bug report tracking this limitation.
Upvotes: 0
Reputation: 12078
You can instead use Exists
subquery for this, to avoid the join caused by subitems__status
, so:
from django.db.models import Exists, OuterRef
class ItemQuerySet(models.QuerySet):
def annotate_subitem_stats(self):
return self.annotate(
count_subitems=Count('subitems'),
has_sent_subitems=Exists(
SubItem.objects.filter(item=OuterRef('pk'), status=Status.sent)
),
)
Upvotes: 7