Reputation: 2421
I'm trying to calculate a lookup for items sharing a common related object + a common value. As an informal summary, I want to use some form of:
my_queryset.values(my_related_obj, my_date).annotate(...)
--in order to get, for each (related_obj, date)
pair a Sum aggregation of matching items in the queryset.
The related_obj
in this case is two joins away -- a Many-to-Many relation, and then a foreign-key. So there are definitely queryset items with multiple values for the related object, or reaching the same related object through a different intermediary object. This seems like it should be doable, but I keep getting incorrect results no matter what I try. I can't get the grouping to be what I want it.
Here's a simple example: I have Persons
assigned to Teams
; I have ExpenseReports
signed by (multiple) Persons
on particular dates.
I want a query that finds, for each pair of Team
and date
, the total expense signed for by that team on that date.
Here's my models:
class MyTeam(models.Model):
name = models.CharField()
class MyPerson(models.Model):
name = models.CharField()
team = models.ForeignKey(MyTeam, on_delete=models.CASCADE)
class ExpenseReport(models.Model):
expense_paid = models.FloatField()
expense_date = models.DateField()
persons = models.ManyToManyField(MyPerson)
And here's some simple data -- expense reports on two dates.
Appa
and Alex
on Team A
; Barbara
and Bob
are on Team B
:
[2024-11-01] 1.0 paid by [<MyPerson: Person <Alex>>, <MyPerson: Person <Appa>>] <-- Team A
[2024-11-01] 10.0 paid by [<MyPerson: Person <Barbara>>, <MyPerson: Person <Bob>>] <-- Team B
[2024-11-05] 100.0 paid by [<MyPerson: Person <Barbara>>] <-- Team B
[2024-11-05] 1000.0 paid by [<MyPerson: Person <Alex>>, <MyPerson: Person <Bob>>] <-- Teams A and B
With this data, the result I'm looking for is:
{'expense_date': datetime.date(2024, 11, 1), 'persons__team__name': 'A Team', 'total_expense': 1.0}
{'expense_date': datetime.date(2024, 11, 1), 'persons__team__name': 'B Team', 'total_expense': 10.0}
{'expense_date': datetime.date(2024, 11, 5), 'persons__team__name': 'A Team', 'total_expense': 1000.0}
{'expense_date': datetime.date(2024, 11, 5), 'persons__team__name': 'B Team', 'total_expense': 1100.0}
What I've Tried
There's the obvious naive implementation -- which is incorrect because it doesn't account for duplicate rows:
reports_qs = ExpenseReport.objects.all()
rows = reports_qs.values("expense_date", "persons__team__name").annotate(total_expense=Sum("expense_paid"))
Easy to see these results are wrong -- values with two team members from the same team get doubled:
{'expense_date': datetime.date(2024, 11, 1), 'persons__team__name': 'B Team', 'total_expense': 20.0} <-- doubled
{'expense_date': datetime.date(2024, 11, 5), 'persons__team__name': 'B Team', 'total_expense': 1100.0}
{'expense_date': datetime.date(2024, 11, 5), 'persons__team__name': 'A Team', 'total_expense': 1000.0}
{'expense_date': datetime.date(2024, 11, 1), 'persons__team__name': 'A Team', 'total_expense': 2.0} <-- doubled
But I thought the solution was using a subquery, and that didn't work either:
reports_qs = ExpenseReport.objects.all()
subquery = (
ExpenseReport.objects.filter(
expense_date=OuterRef("expense_date"),
persons__team__name=OuterRef("persons__team__name"),
)
.values("expense_date", "persons__team__name")
.annotate(total_expense=Sum("expense_paid"))
.values("total_expense")
)
rows = reports_qs.values("expense_date", "persons__team__name").annotate(total_expense=subquery[:1])
This gave the result:
{'expense_date': datetime.date(2024, 11, 1), 'persons__team__name': 'A Team', 'total_expense': 2.0} <-- doubled
{'expense_date': datetime.date(2024, 11, 1), 'persons__team__name': 'A Team', 'total_expense': 2.0} <-- doubled
{'expense_date': datetime.date(2024, 11, 1), 'persons__team__name': 'B Team', 'total_expense': 20.0} <-- doubled
{'expense_date': datetime.date(2024, 11, 1), 'persons__team__name': 'B Team', 'total_expense': 20.0} <-- doubled
{'expense_date': datetime.date(2024, 11, 5), 'persons__team__name': 'B Team', 'total_expense': 1100.0}
{'expense_date': datetime.date(2024, 11, 5), 'persons__team__name': 'B Team', 'total_expense': 1100.0}
{'expense_date': datetime.date(2024, 11, 5), 'persons__team__name': 'A Team', 'total_expense': 1000.0}
--and other desperate attempts to stick distinct()
somewhere helpful did not solve the issue.
I can also see that I outright get a different result from the values().annotate()
structure than I do with an aggregate()
call:
expense_date, team_name = example_report.expense_date, example_report.persons.first().team.name
# values().annotate()
res1 = (
ExpenseReport.objects.filter(
expense_date=expense_date,
persons__team__name=team_name,
)
.values("expense_date", "persons__team__name")
.annotate(total_expense=Sum("expense_paid"))
.values("total_expense")
)
# aggregate()
res2 = (
ExpenseReport.objects.filter(
expense_date=expense_date,
persons__team__name=team_name,
)
.distinct()
.aggregate(total_expense=Sum("expense_paid"))
)
With the result:
> res1=<QuerySet [{'total_expense': 2.0}]> # ...doubled yet again
> res2={'total_expense': 1.0} # correct
Is it possible to perform the aggregation I'm attempting within the Django ORM? What am I doing wrong?
Upvotes: 2
Views: 45
Reputation: 477533
If all .person
s of an ExpenseReport
are always guaranteed to be of the same Team
, we can do this with a Subquery
expression [Django-doc]:
from django.db.models import OuterRef, Subquery, Sum
ExpenseReport.objects.values(
'expense_date',
team_id=Subquery(
MyPerson.objects.filter(
expensereport=OuterRef('pk')
).values('team_id')[:1]
),
).annotate(total_expense=Sum('expense_paid')).order_by('expense_date', 'team_id')
The Subquery(…)
thus obtains the primary key of the team of the "first" person (whatever the order is). We assume that the Person
s all have the same team here.
The question is more what to do if an expense is split over multiple teams.
Upvotes: 0