Eric van der Helm
Eric van der Helm

Reputation: 41

How to subsample time series, to every n’th row in Django?

Goal

I'm trying to reduce the data sent to the front-end of an application by subsampling a timeseries. Initially by just taking every nth row, because I also want to optimize for speed.

Layout

I'm using Django 2.1.5, with a PostgreSQL backend. The time series table is filled with measurement data every 1-15 minutes from various sensors.

models.py

class Measurement(models.Model):
    sensor = models.ForeignKey(Category,on_delete=models.PROTECT,related_name='measurements')
    datapoint = models.DecimalField(max_digits=8, decimal_places=4)
    time = models.DateTimeField(auto_now_add=True)

Related questions

A few resources came up around a close solution here How to filter/reduce a QuerySet to every nth row? and https://stackoverflow.com/a/56487889/11225898 which is based around annotating with 'F' like so:

views.py
Measurement.objects.annotate(idmod2=F('id') % 2).filter(idmod2=0)

This does work till a certain extent, however as the measurements are made by ~5-25 different sensors and come in every ~1-15 minutes, the id field is not consistently tracking. Just to visualize an simplified example:

+----+--------+------+-------+
| id | sensor | data | time  |
+----+--------+------+-------+
|  1 | A      |  432 | 10:00 |
|  2 | A      |  534 | 10:15 |
|  3 | B      | 2342 | 10:20 |
|  4 | B      |   87 | 10:25 |
|  5 | B      |    2 | 10:30 |
|  6 | B      |  982 | 10:45 |
|  7 | A      |   23 | 10:45 |
|  8 | B      |  400 | 10:50 |
+----+--------+------+-------+

If one now tries to filter for sensor 'A' with %2 the list only looks like

|  2 | A      |  534 | 10:15 |

while for 'B'

|  4 | B      |   87 | 10:25 |
|  6 | B      |  982 | 10:45 |
|  8 | B      |  400 | 10:50 |

Window function

I thought of using the Window function of Django

 Measurement.objects.annotate(place=Window(expression=RowNumber(),partition_by=[F('sensor')], order_by=F('time').desc()))

Which gives

django.db.utils.NotSupportedError: Window is disallowed in the filter clause.

And this is also mentioned in the docs

Defaults to False. The SQL standard disallows referencing window functions in the WHERE clause and Django raises an exception when constructing a QuerySet that would do that

and stems from the SQL logical query processing, which first goes through WHERE before the SELECT, so there is no way of executing this query.

Question

What is a reasonably fast way of sub sampling this time series to reduce the dataset? Ideally, I would pass a filter on the time and sensor

.filter(sensor=thisSensor).filter(time__range=(rangeStart, rangeEnd))

and show a subsample, of say 200, records.

Upvotes: 4

Views: 325

Answers (1)

Harsh Nagarkar
Harsh Nagarkar

Reputation: 717

One way you can do is, as your data is inconsistent, you can either quantize data before sending or you can choose the points you want and then by regression predict missing points in js in browser

Upvotes: 1

Related Questions