sadat
sadat

Reputation: 4362

Get top n records for each group with Django queryset

I have a model like the following Table,

create table `mytable`
(
  `person` varchar(10),
  `groupname` int,
  `age` int
);

And I want to get the 2 oldest people from each group. The original SQL question and answers are here StackOverflow and One of the solutions that work is

SELECT
    person,
    groupname,
    age
FROM
(
    SELECT
        person,
        groupname,
        age,
        @rn := IF(@prev = groupname, @rn + 1, 1) AS rn,
        @prev := groupname
    FROM mytable
    JOIN (SELECT @prev := NULL, @rn := 0) AS vars
    ORDER BY groupname, age DESC, person
) AS T1
WHERE rn <= 2

You can check the SQL output here as well SQLFIDLE

I just want to know how can I implement this query in Django's views as queryset.

Upvotes: 2

Views: 1604

Answers (1)

wiaterb
wiaterb

Reputation: 543

Another SQL with similar output would have window function that annotates each row with row number within particular group name and then you would filter row numbers lower or equal 2 in HAVING clause.

At the moment of writing django does not support filtering based on window function result so you need to calculate row in the first query and filter People in the second query.

Following code is based on similar question but it implements limiting number of rows to be returned per group_name.

from django.db.models import F, When, Window
from django.db.models.functions import RowNumber

person_ids = {
    pk
    for pk, row_no_in_group in Person.objects.annotate(
        row_no_in_group=Window(
            expression=RowNumber(), 
            partition_by=[F('group_name')],
            order_by=['group_name', F('age').desc(), 'person']
        )
    ).values_list('id', 'row_no_in_group')
    if row_no_in_group <= 2
}
filtered_persons = Person.objects.filter(id__in=person_ids)

For following state of Person table

>>> Person.objects.order_by('group_name', '-age', 'person').values_list('group_name', 'age', 'person')
<QuerySet [(1, 19, 'Brian'), (1, 17, 'Brett'), (1, 14, 'Teresa'), (1, 13, 'Sydney'), (2, 20, 'Daniel'), (2, 18, 'Maureen'), (2, 14, 'Vincent'), (2, 12, 'Carlos'), (2, 11, 'Kathleen'), (2, 11, 'Sandra')]>

queries above return

>>> filtered_persons.order_by('group_name', '-age', 'person').values_list('group_name', 'age', 'person')
<QuerySet [(1, 19, 'Brian'), (1, 17, 'Brett'), (2, 20, 'Daniel'), (2, 18, 'Maureen')]>

Upvotes: 5

Related Questions