Antash
Antash

Reputation: 988

Django ORM filter by Max column value of two related models

I have 3 related models:

Program(Model):
    ...  # which aggregates ProgramVersions

ProgramVersion(Model):
    program = ForeignKey(Program)
    index = IntegerField()

UserProgramVersion(Model):
    user = ForeignKey(User)
    version = ForeignKey(ProgramVersion)
    index = IntegerField()

ProgramVersion and UserProgramVersion are orderable models based on index field - object with highest index in the table is considered latest/newest object (this is handled by some custom logic, not relevant).

I would like to select all latest UserProgramVersion's, i.e. latest UPV's which point to the same Program.

this can be handled by this UserProgramVersion queryset:

def latest_user_program_versions(self):
    latest = self\
        .order_by('version__program_id', '-version__index', '-index')\
        .distinct('version__program_id')

    return self.filter(id__in=latest)

this works fine however im looking for a solution which does NOT use .distinct() I tried something like this:

def latest_user_program_versions(self):
    latest = self\
        .annotate(
             'max_version_index'=Max('version__index'),
             'max_index'=Max('index'))\
        .filter(
             'version__index'=F('max_version_index'),
             'index'=F('max_index'))

    return self.filter(id__in=latest)

this however does not work

Upvotes: 4

Views: 1864

Answers (1)

hynekcer
hynekcer

Reputation: 15548

Use Subquery() expressions in Django 1.11. The example in docs is similar and the purpose is also to get the newest item for required parent records.

(You could start probably by that example with your objects, but I wrote also a complete more complicated suggestion to avoid possible performance pitfalls.)

from django.db.models import OuterRef, Subquery

...
def latest_user_program_versions(self, *args, **kwargs):
    # You should filter users by args or kwargs here, for performance reasons.
    # If you do it here it is applied also to subquery - much faster on a big db.
    qs = self.filter(*args, **kwargs)
    parent = Program.objects.filter(pk__in=qs.values('version__program'))
    newest = (
        qs.filter(version__program=OuterRef('pk'))
        .order_by('-version__index', '-index')
    )
    pks = (
        parent.annotate(newest_id=Subquery(newest.values('pk')[:1]))
        .values_list('newest_id', flat=True)
    )
    # Maybe you prefer to uncomment this to be it compiled by two shorter SQLs.
    # pks = list(pks)
    return self.filter(pk__in=pks)

If you considerably improve it, write the solution in your answer.


EDIT Your problem in your second solution:
Nobody can cut a branch below him, neither in SQL, but I can sit on its temporary copy in a subquery, to can survive it :-) That is also why I ask for a filter at the beginning. The second problem is that Max('version__index') and Max('index') could be from two different objects and no valid intersection is found.


EDIT2: Verified: The internal SQL from my query is complicated, but seems correct.

SELECT app_userprogramversion.id,...
FROM app_userprogramversion
WHERE app_userprogramversion.id IN
   (SELECT
       (SELECT U0.id
        FROM app_userprogramversion U0
        INNER JOIN app_programversion U2 ON (U0.version_id = U2.id)
        WHERE (U0.user_id = 123 AND U2.program_id = (V0.id))
        ORDER BY U2.index DESC, U0.index DESC LIMIT 1
        ) AS newest_id
    FROM app_program V0 WHERE V0.id IN
       (SELECT U2.program_id AS Col1
        FROM app_userprogramversion U0
        INNER JOIN app_programversion U2 ON (U0.version_id = U2.id)
        WHERE U0.user_id = 123
        )
    )

Upvotes: 3

Related Questions