Reputation: 65
I am developing an app that allows users to choose an ocean buoy and then explore its historical time series data via interactive plotly plots. My end goal is that the user can input a location, which then loads a buoy's data, and the user can then choose filters for which data they'd like to have plotted (so basically a dashboard). For example, maybe they'd like to view only swells that were larger than 1 ft and occurred in the month of August.
I know how to do this using pandas and plotly, but I'm struggling with how to transfer this to a django database framework.
The code currently creates a buoy object that has a data attribute. This data attribute is a dictionary that is structured as follows:
{'stdmet':{'meta':{dictionary of metadata}, 'data':[pandas dataframe of time series data]}}
I have the data in a pandas dataframe as it allows me to easily manipulate and create plotly interactive plots.
My question is how should I save this to a django model? I think I should have a buoy class which has fields for the meta data (e.g., buoy ID, lat/lon, etc.) and then I'd like to have a final field for the time series data. I would like this field to remain as the 2D pandas dataframe.
Another reason I find this method desirable is that it won't create a new buoy object for every single time measurement. Rather, the buoy object will contain the 2D dataframe of all the historical measurements within itself. This just seems cleaner.
Any advice on this is greatly appreciated!
Upvotes: 6
Views: 4414
Reputation: 4284
As of 2021, there is now a library for storing simple timeseries vectors in the database as a custom Django field: django-simple-timeseries
.
class Buoy(models.Model):
location = models.PointField()
name = models.CharField(max_length=200, null=True, blank=True)
swell_size_ft = TimeseriesField(max_points=100, resolution_seconds=3600)
This is a convenient option for small timeseries, and where it is safe to truncate data (after max_points
).
(Disclaimer: I wrote this library based on my own need, which was similar to the questioner's.)
Upvotes: 2
Reputation: 2529
The standard approach here would be to use Django models (translated to tables in the db):
Buoy
BuoyData
The BuoyData
class will have a ForeignKey
to the Buoy
model.
Also look into GeoDjango for storing and querying lat/lngs. You will want to use a PointField
from django.contrib.gis.db.models
. Here are some example models:
from django.contrib.gis.db import models
class Buoy(models.Model):
"""
Represents a Buoy
"""
location = models.PointField()
name = models.CharField(max_length=200, null=True, blank=True)
class BuoyData(models.Model):
"""
Represents a single Buoy datapoint
"""
buoy = models.ForeignKey(Buoy, related_name='data')
date = models.DateTimeField()
# swell size in feet
swell_size_ft = models.DecimalField(decimal_places=2, max_digits=5)
You can then query for a Buoy
at or near a lat/lng, then query for the Buoy
's data.
The django_pandas package has as very helpful method to read a Django Queryset into a Pandas dataframe:
from django_pandas.io import read_frame
buoy = Buoy.objects.get(id=10)
data_queryset = buoy.data.all()
df = read_frame(data_queryset)
# or another example
import datetime
from django.utils import timezone
from django.contrib.gis.geos import Point
lat = 36.123
lng = -121.123
# notice how lng is x, lat is y
location = Point(lng, lat)
time_four_days_ago = timezone.now() - datetime.timedelta(days=4)
closest_buoy = Buoy.objects.distance(location).order_by('distance')[0]
data_queryset = buoy.data.filter(date__gte=time_four_days_ago)
# if there are a lot of fields on your BuoyData model,
# and you only need a few, .values will be more performant
data_values = data_queryset.values('date', 'swell_size_ft')
df = read_frame(data_values)
Why your approach will not work:
Relational databases (MySQL, Postgresql) generally store data in fixed sized spaces. There might be millions of time series data points for a Buoy, and new data might be added every few seconds. You don't know how large all future data for a single Buoy will be.
When setting up the database, you need to set an upper limit on the size each of fields of a specific model. For example a text field has a max_length attribute. Sure, some types of fields support very large maximum sizes, but this will impact the performance of the database.
Instead, you need to create a separate table that will store all data Buoys
. Each row in this table will be a separate time series data. One field (column) in this database will be the id
of the Buoy
that the data belongs to. This field is automatically created and populated by the Django ForeignKey.
Upvotes: 4