Reputation: 41
Goal
I'm trying to reduce the data sent to the front-end of an application by subsampling a timeseries. Initially by just taking every nth row, because I also want to optimize for speed.
Layout
I'm using Django 2.1.5, with a PostgreSQL backend. The time series table is filled with measurement data every 1-15 minutes from various sensors.
models.py
class Measurement(models.Model):
sensor = models.ForeignKey(Category,on_delete=models.PROTECT,related_name='measurements')
datapoint = models.DecimalField(max_digits=8, decimal_places=4)
time = models.DateTimeField(auto_now_add=True)
Related questions
A few resources came up around a close solution here How to filter/reduce a QuerySet to every nth row? and https://stackoverflow.com/a/56487889/11225898 which is based around annotating with 'F' like so:
views.py
Measurement.objects.annotate(idmod2=F('id') % 2).filter(idmod2=0)
This does work till a certain extent, however as the measurements are made by ~5-25 different sensors and come in every ~1-15 minutes, the id field is not consistently tracking. Just to visualize an simplified example:
+----+--------+------+-------+
| id | sensor | data | time |
+----+--------+------+-------+
| 1 | A | 432 | 10:00 |
| 2 | A | 534 | 10:15 |
| 3 | B | 2342 | 10:20 |
| 4 | B | 87 | 10:25 |
| 5 | B | 2 | 10:30 |
| 6 | B | 982 | 10:45 |
| 7 | A | 23 | 10:45 |
| 8 | B | 400 | 10:50 |
+----+--------+------+-------+
If one now tries to filter for sensor 'A' with %2 the list only looks like
| 2 | A | 534 | 10:15 |
while for 'B'
| 4 | B | 87 | 10:25 |
| 6 | B | 982 | 10:45 |
| 8 | B | 400 | 10:50 |
Window function
I thought of using the Window function of Django
Measurement.objects.annotate(place=Window(expression=RowNumber(),partition_by=[F('sensor')], order_by=F('time').desc()))
Which gives
django.db.utils.NotSupportedError: Window is disallowed in the filter clause.
And this is also mentioned in the docs
Defaults to False. The SQL standard disallows referencing window functions in the WHERE clause and Django raises an exception when constructing a QuerySet that would do that
and stems from the SQL logical query processing, which first goes through WHERE before the SELECT, so there is no way of executing this query.
Question
What is a reasonably fast way of sub sampling this time series to reduce the dataset? Ideally, I would pass a filter on the time and sensor
.filter(sensor=thisSensor).filter(time__range=(rangeStart, rangeEnd))
and show a subsample, of say 200, records.
Upvotes: 4
Views: 325
Reputation: 717
One way you can do is, as your data is inconsistent, you can either quantize data before sending or you can choose the points you want and then by regression predict missing points in js in browser
Upvotes: 1