AlanSE
AlanSE

Reputation: 2775

How to annotate a distinct Count over multiple relationships in Django?

Given a model that has more than one kind of connection to a related model (I will call the "parent" model), how could I annotate a queryset with a count of parent model objects that are linked through either connection without counting duplicates?

Example model definitions

Consider an Article model that has 2 links to a parent Publication model that are very similar in meaning.

from django.db import models

class Publication(models.Model):
    pass

class Article(models.Model):
    publication = models.ForeignKey(Publication, related_name='publications')
    owner = models.ForeignKey(Publication, related_name='owned_articles')

Objective

I want to serve a page that is a list of publications. A business requirement is that the number of articles that the publication wishes to take credit for are shown (these publications prefer a generous metric for counting). An article is considered part of the organization if either the "owner" or "publication" field points to it, but no articles should be counted more than once for a single publication. An article may be included in the count of 2 publications if publication points to a different object than owner.

I don't want to execute a query for every publication in the list.

The problem with Count annotations here

Publication.objects.annotate(Count('publications'), Count('owned_articles')) would be trivial. Then I will have count__publications and count__owned_articles.

My problem is that I can't tell how many articles in count__publications were also counted in count__owned_articles. Django doesn't allow me to cram a full queryset into Count, so in this general case of needing extra control of what is counted a special mechanism is needed.

Similar questions

I have found this situation most similar to the question here:

Django annotate count with a distinct field

You could contrive this same general situation by intensifying that question's request by adding another related model in addition to InformationUnit and asking for a count of unique usernames among both related models.

Upvotes: 2

Views: 967

Answers (2)

odigity
odigity

Reputation: 8166

I believe the correct answer is using Count("publications", distinct=True), as described here:

https://docs.djangoproject.com/en/3.2/topics/db/aggregation/#combining-multiple-aggregations

Upvotes: 1

AlanSE
AlanSE

Reputation: 2775

(initial answer, answering my own question with a so-so solution)

The preferable approach would be to start with a Publication queryset, however, I can manage to squeeze a solution out of the Django ORM by pivoting around the Article queryset instead.

Consider as a solution to this problem:

exclusive_owners_qs = Article.objects.exclude(
    publication=F('owner')
).annotate(Count('publication')).order_by('publication')

publications_qs = Article.objects.annotate(Count('owner')).order_by('owner')

With this, I can loop over the two querysets and add up the 2 numbers locally inside of python to get the correct counts.

This satisfies the requirements, but it's also not an elegant solution. Eliminating the need for a python loop would be ideal.

Upvotes: 1

Related Questions