Ziv
Ziv

Reputation: 2421

Can I aggregate over values(Many-to-Many->val_a, val_b)?

I'm trying to calculate a lookup for items sharing a common related object + a common value. As an informal summary, I want to use some form of:

my_queryset.values(my_related_obj, my_date).annotate(...)

--in order to get, for each (related_obj, date) pair a Sum aggregation of matching items in the queryset.

The related_obj in this case is two joins away -- a Many-to-Many relation, and then a foreign-key. So there are definitely queryset items with multiple values for the related object, or reaching the same related object through a different intermediary object. This seems like it should be doable, but I keep getting incorrect results no matter what I try. I can't get the grouping to be what I want it.

Here's a simple example: I have Persons assigned to Teams; I have ExpenseReports signed by (multiple) Persons on particular dates. I want a query that finds, for each pair of Team and date, the total expense signed for by that team on that date.

Here's my models:

class MyTeam(models.Model):
    name = models.CharField()

class MyPerson(models.Model):
    name = models.CharField()
    team = models.ForeignKey(MyTeam, on_delete=models.CASCADE)

class ExpenseReport(models.Model):
    expense_paid = models.FloatField()
    expense_date = models.DateField()
    persons = models.ManyToManyField(MyPerson)

And here's some simple data -- expense reports on two dates. Appa and Alex on Team A ; Barbara and Bob are on Team B:

[2024-11-01] 1.0 paid by [<MyPerson: Person <Alex>>, <MyPerson: Person <Appa>>]    <-- Team A
[2024-11-01] 10.0 paid by [<MyPerson: Person <Barbara>>, <MyPerson: Person <Bob>>] <-- Team B
[2024-11-05] 100.0 paid by [<MyPerson: Person <Barbara>>]                          <-- Team B
[2024-11-05] 1000.0 paid by [<MyPerson: Person <Alex>>, <MyPerson: Person <Bob>>]  <-- Teams A and B

With this data, the result I'm looking for is:

{'expense_date': datetime.date(2024, 11, 1), 'persons__team__name': 'A Team', 'total_expense': 1.0}
{'expense_date': datetime.date(2024, 11, 1), 'persons__team__name': 'B Team', 'total_expense': 10.0}
{'expense_date': datetime.date(2024, 11, 5), 'persons__team__name': 'A Team', 'total_expense': 1000.0}
{'expense_date': datetime.date(2024, 11, 5), 'persons__team__name': 'B Team', 'total_expense': 1100.0}

What I've Tried

There's the obvious naive implementation -- which is incorrect because it doesn't account for duplicate rows:

    reports_qs = ExpenseReport.objects.all()
    rows = reports_qs.values("expense_date", "persons__team__name").annotate(total_expense=Sum("expense_paid"))

Easy to see these results are wrong -- values with two team members from the same team get doubled:

{'expense_date': datetime.date(2024, 11, 1), 'persons__team__name': 'B Team', 'total_expense': 20.0} <-- doubled
{'expense_date': datetime.date(2024, 11, 5), 'persons__team__name': 'B Team', 'total_expense': 1100.0}
{'expense_date': datetime.date(2024, 11, 5), 'persons__team__name': 'A Team', 'total_expense': 1000.0}
{'expense_date': datetime.date(2024, 11, 1), 'persons__team__name': 'A Team', 'total_expense': 2.0}  <-- doubled

But I thought the solution was using a subquery, and that didn't work either:

    reports_qs = ExpenseReport.objects.all()
    subquery = (
        ExpenseReport.objects.filter(
            expense_date=OuterRef("expense_date"),
            persons__team__name=OuterRef("persons__team__name"),
        )
        .values("expense_date", "persons__team__name")
        .annotate(total_expense=Sum("expense_paid"))
        .values("total_expense")
    )
    rows = reports_qs.values("expense_date", "persons__team__name").annotate(total_expense=subquery[:1])

This gave the result:

{'expense_date': datetime.date(2024, 11, 1), 'persons__team__name': 'A Team', 'total_expense': 2.0} <-- doubled
{'expense_date': datetime.date(2024, 11, 1), 'persons__team__name': 'A Team', 'total_expense': 2.0} <-- doubled
{'expense_date': datetime.date(2024, 11, 1), 'persons__team__name': 'B Team', 'total_expense': 20.0} <-- doubled
{'expense_date': datetime.date(2024, 11, 1), 'persons__team__name': 'B Team', 'total_expense': 20.0} <-- doubled
{'expense_date': datetime.date(2024, 11, 5), 'persons__team__name': 'B Team', 'total_expense': 1100.0}
{'expense_date': datetime.date(2024, 11, 5), 'persons__team__name': 'B Team', 'total_expense': 1100.0}
{'expense_date': datetime.date(2024, 11, 5), 'persons__team__name': 'A Team', 'total_expense': 1000.0}

--and other desperate attempts to stick distinct() somewhere helpful did not solve the issue.

I can also see that I outright get a different result from the values().annotate() structure than I do with an aggregate() call:

expense_date, team_name = example_report.expense_date, example_report.persons.first().team.name

    # values().annotate()
    res1 = (
        ExpenseReport.objects.filter(
            expense_date=expense_date,
            persons__team__name=team_name,
        )
        .values("expense_date", "persons__team__name")
        .annotate(total_expense=Sum("expense_paid"))
        .values("total_expense")
    )

    # aggregate()
    res2 = (
        ExpenseReport.objects.filter(
            expense_date=expense_date,
            persons__team__name=team_name,
        )
        .distinct()
        .aggregate(total_expense=Sum("expense_paid"))
    )

With the result:

> res1=<QuerySet [{'total_expense': 2.0}]>  # ...doubled yet again
> res2={'total_expense': 1.0}               # correct

Is it possible to perform the aggregation I'm attempting within the Django ORM? What am I doing wrong?

Upvotes: 2

Views: 45

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477533

If all .persons of an ExpenseReport are always guaranteed to be of the same Team, we can do this with a Subquery expression [Django-doc]:

from django.db.models import OuterRef, Subquery, Sum

ExpenseReport.objects.values(
    'expense_date',
    team_id=Subquery(
        MyPerson.objects.filter(
            expensereport=OuterRef('pk')
        ).values('team_id')[:1]
    ),
).annotate(total_expense=Sum('expense_paid')).order_by('expense_date', 'team_id')

The Subquery(…) thus obtains the primary key of the team of the "first" person (whatever the order is). We assume that the Persons all have the same team here.

The question is more what to do if an expense is split over multiple teams.

Upvotes: 0

Related Questions