keyvanm
keyvanm

Reputation: 191

Django annotate returning duplicate entries

I'm annotating a queryset like so:

class ItemQuerySet(models.QuerySet):
    def annotate_subitem_stats(self):
        return self.annotate(
            count_subitems=Count('subitems'),
            has_sent_subitems=Case(
                When(subitems__status=Status.sent, then=Value(True)),
                default=Value(False)
            ),
        )

In this example, SubItem is a model with a foreign key to Item.

A strange behaviour happens when I run this code. Suppose we have 1 Item and 2 SubItems linked to it. One subitem has status sent and the other doesn't. When I run annotate on the queryset, the queryset returns the item twice, one with has_sent_subitems set to True and the other set to False. The other strange thing is that, one duplicate has count_subitems == 1, and the other has count_subitems == 1, as if the queryset has split item into two rows, one where status == 'sent' and the other where status != 'sent'.

This is basically what the annotated queryset looks like:

[
    {
        'name': 'Item Name',
        'count_subitems': 1,
        'has_sent_subitem': False
    },
    {
        'name': 'Item Name',
        'count_subitems': 1,
        'has_sent_subitem': True
    }
]

This is what the database looks like, using pseudocode:

item = Item()
SubItem(item=item, status=draft)
SubItem(item=item, status=sent)

I'm pretty sure this has to do with line When(subitems__status=Status.sent, then=Value(True)),. Is there any way I can make that line check if only 1 item has status sent, and then set the annotation to true and move on?

P.S. Using .distinct() didn't work. I can't use .distinct(field), because annotate() + distinct(fields) is not implemented.

Upvotes: 4

Views: 2828

Answers (2)

Matthijs Kooijman
Matthijs Kooijman

Reputation: 2797

For some extra context: I think this is a known and documented limitation (though the docs are easy to miss). The docs on aggreation have this section (copied from Django 4.2 docs):

Combining multiple aggregations

Combining multiple aggregations with annotate() will yield the wrong results because joins are used instead of subqueries:

>>> book = Book.objects.first()
>>> book.authors.count()
2
>>> book.store_set.count()
3
>>> q = Book.objects.annotate(Count('authors'), Count('store'))
>>> q[0].authors__count
6
>>> q[0].store__count
6

For most aggregates, there is no way to avoid this problem, however, the Count aggregate has a distinct parameter that may help:

>>> q = Book.objects.annotate(Count('authors', distinct=True), Count('store', distinct=True))
>>> q[0].authors__count
2
>>> q[0].store__count
3

Another way to circumvent the limitation is to put the aggregate in a subquery. I previously explained how to use the django-sql-utils package for this:

For anyone else running into this, a reasonable workaround seems to be to use subqueries for aggregating annotations. This is a bit verbose/hacky in Django currently, as shown by the multitude of approaches in the stackoverflow link from Antoine's comment. However, I've successfully used the django-sql-utils package for this just now. That sounds a bit bulky, but it just has two utilities, one of which is a SubqueryAggregate class (with derived SubqueryCount, SubquerySum, etc.) that make converting a regular joining aggregate into a subquery aggregate easy and concise, without changing the structure much.

For example taking the example from comment 66 and converting the second annotation to a subquery with django-sql-utils, you'd get:

Branch.objects.annotate(
    total=Sum('center__client__loan__amount'),
    repaid=SubquerySum('center__client__loan__payment_schedule__payments__principal'),
)

There is also a bug report tracking this limitation.

Upvotes: 0

Brian Destura
Brian Destura

Reputation: 12078

You can instead use Exists subquery for this, to avoid the join caused by subitems__status, so:

from django.db.models import Exists, OuterRef


class ItemQuerySet(models.QuerySet):
    def annotate_subitem_stats(self):
        return self.annotate(
            count_subitems=Count('subitems'),
            has_sent_subitems=Exists(
                SubItem.objects.filter(item=OuterRef('pk'), status=Status.sent)
            ),
        )

Upvotes: 7

Related Questions