dArignac
dArignac

Reputation: 1209

Remove some "duplicates" from a Django QuerySet

I have a Django QuerySet containing several results for a model:

class MyModel(models.Model):
    value = models.FloatField()
    date_seen = models.DateTimeField()

Now the QuerySet is limited over time and contains duplicate results:

MyModel.date_seen   - MyModel.value
2016-03-01 10:00:00 - 124
2016-03-01 11:00:00 - 124
2016-03-01 12:00:00 - 99
2016-03-01 13:00:00 - 99
2016-03-01 14:00:00 - 99
2016-03-01 15:00:00 - 103
2016-03-01 16:00:00 - 103
2016-03-01 17:00:00 - 176
2016-03-01 18:00:00 - 12
2016-03-01 19:00:00 - 12

Now what I want to have is a QuerySet only containing the results where the value has changed, so the result would be the following:

MyModel.date_seen   - MyModel.value
2016-03-01 10:00:00 - 124
2016-03-01 12:00:00 - 99
2016-03-01 15:00:00 - 103
2016-03-01 17:00:00 - 176
2016-03-01 18:00:00 - 12

The list is just an example and the result count is significantly higher (about 1000 results) so I need a performant solution ;-) Is this even possible? Any suggestions? Thanks a lot!

Btw. I use postgresql underneath as database.

Upvotes: 2

Views: 665

Answers (1)

niklas
niklas

Reputation: 3011

I found a one query solution that only works on POSTGRESQL, based on this beautiful post, where they explain how to make use of postgres lead function that can be used to evaluate values of arbitrary rows before the current row. Just make sure to replace clubs_mymodel with [yourappname]_mymodel in the query below and you should be ready to go!

Output:

for m in MyModel.objects.all():
   print m

2016-03-21 10:00:00+00:00 - 124.0
2016-03-21 11:00:00+00:00 - 124.0
2016-03-21 12:00:00+00:00 - 99.0
2016-03-21 13:00:00+00:00 - 99.0
2016-03-21 14:00:00+00:00 - 99.0
2016-03-21 16:00:00+00:00 - 103.0
2016-03-21 15:00:00+00:00 - 103.0
2016-03-21 17:00:00+00:00 - 176.0
2016-03-21 18:00:00+00:00 - 12.0
2016-03-21 19:00:00+00:00 - 12.0
2016-03-21 20:00:00+00:00 - 90.0
2016-03-21 21:00:00+00:00 - 12.0

q = MyModel.objects.raw('SELECT m1.id, m1.value, m1.date_seen \
    FROM (SELECT m2.id, m2.value, m2.date_seen, lead(m2.value) \
        OVER (ORDER BY m2.date_seen DESC) as prev_value \
        FROM clubs_mymodel m2 ORDER BY m2.date_seen) as m1 \
    WHERE m1.value IS DISTINCT FROM m1.prev_value \
    ORDER BY m1.date_seen ASC')

for m in q:
   print m

2016-03-21 10:00:00+00:00 - 124.0
2016-03-21 12:00:00+00:00 - 99.0
2016-03-21 15:00:00+00:00 - 103.0
2016-03-21 17:00:00+00:00 - 176.0
2016-03-21 18:00:00+00:00 - 12.0
2016-03-21 20:00:00+00:00 - 90.0
2016-03-21 21:00:00+00:00 - 12.0

Upvotes: 2

Related Questions