Reputation: 5574
I have a MySQL database storing a long, detailed time series (i.e., years of measurements sampled several times/minute). When I give this data to a client for display, I'd like to downsample the data so that I'm not sending massive amounts of data to graph. For example, if the client needs to display 1 year's worth of data, I don't want to send every sample from that year; I'd rather send, say, one point for every hour.
I'm using Flask/SQLAlchemy, and obviously I could just pull all the data from that interval and process it in Flask, but I'd prefer to accomplish this with SQL/SQLAlchemy for efficiency and so I don't have to re-implement Flask-SQLAlchemy's pagination. One way to do this would be to filter on datetime for rows that have some time fields==0; for example, if I got a year's worth of data where MM:SS==00:00, I'd get one point per hour, which is pretty much what I need. Is this possible? Is there a better way to get every Nth row from a query result in SQL or SQLAlchemy?
Edit for clarity: I am looking for an efficient way to return every Nth row in a DATETIME range. I can get every row in a range pretty easily; it's the every-Nth downsampling that's got me. I don't necessarily need this to work for arbitrary N; it's sufficient if I can get, say every row with minutes==0 and seconds==0, or every row with seconds==0, etc.
Upvotes: 0
Views: 163
Reputation: 34232
It is possible to get a single value per hour using group by and summary function, such as min() or avg():
select date_format(observation_date, "%Y %m %d %H") as obs_hour, avg(observation_value) as avg_value
from observations
group by date_format(observation_date, "%Y %m %d %H")
The above query will group the observations by hour using date_format() function and will give you the average of the observed values within the hour. You can use different summary functions, such as min() or max() to get different value from each group as befitting for your sampling technique.
You can also get those observations that were taken exactly at a round hour using where criteria:
select *
from observations
where minute(observation_date)=0 and second(observation_date)=0
Upvotes: 1