ogr
ogr

Reputation: 690

Django ORM and GROUP BY

Newcommer to Django here.

I'm currently trying to fetch some data from my model with a query that need would need a GROUP BY in SQL.

Here is my simplified model:

class Message(models.Model):
    mmsi = models.CharField(max_length=16)
    time = models.DateTimeField()
    point = models.PointField(geography=True)

I'm basically trying to get the last Message from every distinct mmsi number.

In SQL that would translates like this for example:

select a.* from core_message a
inner join
(select mmsi, max(time) as time from core_message group by mmsi) b
on a.mmsi=b.mmsi and a.time=b.time;

After some tries, I managed to have something working similarly with Django ORM:

>>> mf=Message.objects.values('mmsi').annotate(Max('time'))
>>> Message.objects.filter(mmsi__in=mf.values('mmsi'),time__in=mf.values('time__max'))

That works, but I find my Django solution quite clumsy. Not sure it's the proper way to do it.

Looking at the underlying query this looks like this :

>>> print(Message.objects.filter(mmsi__in=mf.values('mmsi'),time__in=mf.values('time__max')).query)
SELECT "core_message"."id", "core_message"."mmsi", "core_message"."time", "core_message"."point"::bytea FROM "core_message" WHERE ("core_message"."mmsi" IN (SELECT U0."mmsi" FROM "core_message" U0 GROUP BY U0."mmsi") AND "core_message"."time" IN (SELECT MAX(U0."time") AS "time__max" FROM "core_message" U0 GROUP BY U0."mmsi"))

I'd appreciate if you could propose a better solution for this problem.

Thanks !

Upvotes: 0

Views: 203

Answers (2)

ogr
ogr

Reputation: 690

Using the answers and comments, I managed to solve this using a subquery or a simple distinct order by.

Simple distinct order by solution inspired by @Oriphiel answer:

Message.objects.distinct('mmsi').order_by('mmsi','-time')

The underlying SQL query looks like this :

SELECT DISTINCT ON ("core_message"."mmsi") "core_message"."id", "core_message"."mmsi", "core_message"."time", "core_message"."point"::bytea
FROM "core_message" 
ORDER BY "core_message"."mmsi" ASC, "core_message"."time" DESC

Simple and straightforward.

Subquery solution inspired by @DanielRoseman comment:

time_order=Message.objects.filter(mmsi=OuterRef('mmsi')).order_by('-time')
Message.objects.filter(id__in=Subquery(time_order.values('id')[:1]))

The underlying SQL query looks like this :

SELECT "core_message"."id", "core_message"."mmsi", "core_message"."time", "core_message"."point"::bytea 
FROM "core_message" 
WHERE "core_message"."id" IN 
(SELECT U0."id" FROM "core_message" U0 WHERE U0."mmsi" = ("core_message"."mmsi") ORDER BY U0."time" DESC  LIMIT 1)

A tad more complex but it gives more flexibility. If I wanted to get first five messages for every MMSI, I'd just need to change the LIMIT value. In Django, it would look like this :

Message.objects.filter(id__in=Subquery(time_order.values('id')[:5]))

Upvotes: 1

Oriphiel
Oriphiel

Reputation: 59

You only need something like this:

Message.objects.all().distinct('mmsi').values('mmsi', 'time').order_by('mmsi','-id')

or like this:

Message.objects.all().values('mmsi').annotate(date_last=Max('time'))

Note: the last is translate by Django in this sql query:

SELECT "message"."mmsi", MAX("message"."time") AS "date_last" FROM "message" GROUP BY "message"."mmsi", "message"."time" ORDER BY "message"."time" DESC

Upvotes: 1

Related Questions