Andrew
Andrew

Reputation: 12442

How to aggregate sum over multiple date ranges in query set?

Is it possible to annotate .sum()s over multiple date ranges in one QuerySet

Ie, basically combining these, so each object has the sum for each date range.

query_set_week = DailyReports.objects.filter(
    date__range=('2018-08-27', '2018-08-31')) \
        .select_related('profile') \
        .values('profile__user_id') \
        .annotate(premium=Sum('total_field'),
            first_name=F('profile__user__first_name'),
            last_name=F('profile__user__last_name') \
        .order_by('profile__agent_code')


query_set_year = DailyReports.objects.filter(
    date__range=('2018-01-01', '2018-08-31')) \
        .select_related('profile') \
        .values('profile__user_id') \
        .annotate(premium=Sum('total_field'),
            first_name=F('profile__user__first_name'),
            last_name=F('profile__user__last_name') \
        .order_by('profile__agent_code')

Both work individually, but it's difficult to loop through and display the data (user - week total - year total, for example), because someone may have a result in the year filter, but not in the week filter.

Edit: I'm currently able to accomplish my goal using .raw() with a massive SQL statement, but I figured there was a more Pythonic way to do it.

Upvotes: 2

Views: 1143

Answers (2)

Stefan Collier
Stefan Collier

Reputation: 4682

If you want more than one row, you could union two querysets like so:

base_qs = DailyReports.objects \
        .select_related('profile') \
        .values('profile__user_id') \
        .annotate(premium=Sum('total_field'),
            first_name=F('profile__user__first_name'),
            last_name=F('profile__user__last_name') \
        .order_by('profile__agent_code')

query_set_week = base_qs.filter(date__range=('2018-08-27', '2018-08-31'))
query_set_year = base_qs.filter(date__range=('2018-01-01', '2018-08-31'))

query_set_week_and_year = query_set_week.union(query_set_year)

query_set_week_and_year should give you two rows, by only executing one SQL query!


What's happening here?

Most importantly django does not execute any SQL until a queryset is evaluated (e.g. iterated over, list()-ed, len()-ed, and so on). So we are just constructing SQL, not executing what looks like 4 queries!

Most databases (even sqlite, I think) has some query optimization. I.e. it will see things that are repeated in the two queries and do that bit first (In a way that the code appears to be doing). So we should be doing this in (near enough) the most efficient way.


So it shouldn't be too time-intensive and I think provides some readable code(?)!

Upvotes: 2

Stefan Collier
Stefan Collier

Reputation: 4682

As far as I can see it seems you've got two almost-identical queries that only differ by the date__range. Why not try using a SQL OR on the date range using the Q-objects

You can attempt the following:

query_set_week_and_year = DailyReports.objects \
    .filter(
        Q(date__range=('2018-08-27', '2018-08-31')) | Q(date__range=('2018-01-01', '2018-08-31')) \
    ).select_related('profile') \
    .values('profile__user_id') \
    .annotate(
        premium=Sum('total_field'),
        first_name=F('profile__user__first_name'),
        last_name=F('profile__user__last_name'
    ).order_by('profile__agent_code')

Q-objects in a nutshell

Q objects let you apply AND and OR filter conditions via the binary & and | operators. (But you only really need them for the OR operation as AND is handled by a comma in the filter function)

For example finding all users that have a firstname or lastname that begins with 'A':

User.objects.filter(Q(firstname__startswith='A') | Q(lastname__startswith='A'))

Side note

It seems line one date__range is contained in the other: 2018-08-27-->2018-08-31 is within 2018-01-01-->2018-08-31. If these are the two date ranges you care about, then all you need is the second query..?

Upvotes: 1

Related Questions