Xantho
Xantho

Reputation: 65

Django distinct related querying

I have two models:

Model A is an AbstractUserModel and Model B

class ModelB:
    user = ForeignKey(User, related_name='modelsb')
    timestamp = DateTimeField(auto_now_add=True)

What I want to find is how many users have at least one ModelB object created at least in 3 of the 7 past days.

So far, I have found a way to do it but I know for sure there is a better one and that is why I am posting this question.

I basically split the query into 2 parts.

Part1:
I added a foo method inside the User Model that checks if a user meets the above conditions

def foo(self):
    past_limit = starting_date - timedelta(days=7)
    return self.modelsb.filter(timestamp__gte=past_limit).order_by('timestamp__day').distinct('timestamp__day').count() > 2

Part 2:
In the Custom User Manager, I find the users that have more than 2 modelsb objects in the last 7 days and iterate through them applying the foo method for each one of them.
By doing this I narrow down the iterations of the required for loop. (basically its a filter function but you get the point)

def boo(self):
    past_limit = timezone.now() - timedelta(days=7)
    candidates = super().get_queryset().annotate(rc=Count('modelsb', filter=Q(modelsb__timestamp__gte=past_limit))).filter(rc__gt=2)
    return list(filter(lambda x: x.foo(), candidates))

However, I want to know if there is a more efficient way to do this, that is without the for loop.

Upvotes: 1

Views: 112

Answers (1)

Nico Griffioen
Nico Griffioen

Reputation: 5405

You can use conditional annotation.

I haven't been able to test this query, but something like this should work:

from django.db.models import Q, Count

past_limit = starting_date - timedelta(days=7)
users = User.objects.annotate(
                    modelsb_in_last_seven_days=Count('modelsb__timestap__day', 
                        filter=Q(modelsb__timestamp__gte=past_limit), 
                        distinct=True))
            .filter(modelsb_in_last_seven_days__gte = 3)

EDIT:

This solution did not work, because the distinct option does specify what field makes an entry distinct.

I did some experimenting on my own Django instance, and found a way to make this work using SubQuery. The way this works is that we generate a subquery where we make the distinction ourself.

counted_modelb = ModelB.objects
    .filter(user=OuterRef('pk'), timestamp__gte=past_limit)
    .values('timestamp__day')
    .distinct()
    .annotate(count=Count('timestamp__day'))
    .values('count')

query = User.objects
    .annotate(modelsb_in_last_seven_days=Subquery(counted_modelb, output_field=IntegerField()))
    .filter(modelsb_in_last_seven_days__gt = 2)

This annotates each row in the queryset with the count of all distinct days in modelb for the user, with a date greater than the selected day.

In the subquery I use values('timestamp__day') to make sure I can do distinct() (Because a combination of distinct('timestamp__day') and annotate() is unsupported.)

Upvotes: 2

Related Questions