Owais Lone
Owais Lone

Reputation: 698

Django prefetch_related with limit

Is there a way to tell prefetch_related to only fetch a limited set of related objects? Lets say I am fetching a list of users and I know I want to fetch their recent comments. Instead of fetching comments for each user in a loop, I use prefetch_related to pre-fetch them at the time of fetching the users. My understanding is that this will fetch all the comments made by any user present in the result of the original query but I only want to show the latest 5 for each user.

How does this affect the performance if the list of comments is really huge? Is there a way to fetch only 5 comments for each user in a single (or 2) query? It doesn't have to be the same query as the original one for fetching users but that would be nice.

I essentially want to turn this

   users = User.objects.all()
   for user in users:
       user.comments.all()[:10]

into something like this

 User.objects.all().prefetch_related('comments', limit=10)

so if a user has 100s or 10000s of comments, they are not all loaded into memory. How would you do something like this in raw SQL?

Upvotes: 43

Views: 20886

Answers (4)

gospodima
gospodima

Reputation: 11

It is also possible to do by using CTE and ROW_NUMBER().

from django.db.models import Prefetch
from django.db.models.functions.window import RowNumber
from django_cte import With

cte = With(
    Comment.objects.annotate(
        row_number=Window(
            expression=RowNumber(),
            partition_by=F("user_id")
        )
    )
)
qs = cte.with_cte(cte).filter(row_number<=10)
users = User.objects.prefetch_related(
    Prefetch("comments", queryset=qs, to_attr="limited_comments")
)

Upvotes: 1

Rafi
Rafi

Reputation: 135

thats what actually works for me django(2.1) (based on haseebahmad answer).
in order for prefetch_related to accept customize queryset: Prefetch
so:

from django.db.models import OuterRef, Subquery ,Prefetch

User.objects.all().prefetch_related(Prefetch('comment_set',  
queryset=Comment.objects.filter(id__in= 
Subquery(Comment.objects.filter(user_id=OuterRef('user_id')).
values_list('id', flat=True)[:1]))))

Upvotes: 4

haseebahmad
haseebahmad

Reputation: 563

I think there is a workaround now to in django new version as we have OuterRef and Subquery.

from django.db.models import OuterRef, Subquery, Prefetch

subqry = Subquery(Comment.objects \
    .filter(user_id=OuterRef('user_id')) \
    .values_list('id', flat=True)[:5])

User.objects.prefetch_related(
    Prefetch('comments', queryset=Comment.objects.filter(id__in=subqry)))

Upvotes: 44

C14L
C14L

Reputation: 12558

The only way to limit the number of prefetched related objects seems to be using Prefetch() and filtering on fileds. Using sliceing

User.objects.all().prefetch_related(
    Prefetch('msg_sent', queryset=UserMsg.objects.order_by('-created')[:10]))

returns an error

AssertionError: Cannot filter a query once a slice has been taken.

The only way to limit the number of related objects seems to be using filter on a value, for example

from datetime import datetime, timedelta
timelimit = datetime.now() - timedelta(days=365)

User.objects.all().prefetch_related(
    Prefetch('msg_sent', queryset=UserMsg.objects.filter(created__gte=timelimit)))

While that doesn't return a fixed number, in may be useful in some situation, and it will reduce the number of prefetched objects.

Upvotes: 13

Related Questions