Reputation: 11
In below query, I am calculating the minimum, the maximum, and the average for a two hour interval using PostgreSQL.
The query works fine for even start hours (..04:00:00+05:30)
, but it gives similar result as that of even start time for odd start hours (..05:00:00+05:30)
.
The multiple by 2 returns even hours which is the problem.
SELECT tagid, CAST(sample_time_stamp as Date) AS stat_date,
(floor(extract(hour from sample_time_stamp)/2) * 2)::int AS hrs,
min(sensor_reading) AS theMin,
max(sensor_reading) AS theMax,
avg(sensor_reading) AS theAvg
FROM sensor_readings WHERE tagid =1 AND
sample_time_stamp BETWEEN '2012-10-23 01:00:00+05:30'
AND '2012-10-23 05:59:00+05:30'
GROUP BY tagid,CAST(sample_time_stamp as Date),
floor(extract(hour from sample_time_stamp)/2) * 2
ORDER BY tagid,stat_date, hrs
tagid date hrs theMin themax theAvg
1 2012-10-23 0 6 58 30.95
1 2012-10-23 2 2 59 29.6916666666667
1 2012-10-23 4 3 89 31.7666666666667
tagid date hrs theMin themax theAvg
1 2012-10-23 2 2 59 29.6916666666667
1 2012-10-23 4 3 89 31.7666666666667
Upvotes: 1
Views: 1788
Reputation: 656381
To get constant time-frames starting with your minimum timestamp:
WITH params AS (
SELECT '2012-10-23 01:00:00+05:30'::timestamptz AS _min -- input params
,'2012-10-23 05:59:00+05:30'::timestamptz AS _max
,'2 hours'::interval AS _interval
)
,ts AS (SELECT generate_series(_min, _max, _interval) AS t_min FROM params)
,timeframe AS (
SELECT t_min
,lead(t_min, 1, _max) OVER (ORDER BY t_min) AS t_max
FROM ts, params
)
SELECT s.tagid
,t.t_min
,t.t_max -- mildly redundant except for last row
,min(s.sensor_reading) AS the_min
,max(s.sensor_reading) AS the_max
,avg(s.sensor_reading) AS the_avg
FROM timeframe t
LEFT JOIN sensor_readings s ON s.tagid = 1
AND s.sample_time_stamp >= t.t_min
AND s.sample_time_stamp < t.t_max
GROUP BY 1,2,3
ORDER BY 1,2;
Can be used for any time frame and any interval length. Requires PostgreSQL 8.4 or later.
If the maximum timestamp _max
does not fall on _min + n * _interval
the last time-frame is truncated. The last row can therefore represent a shorter time-frame than your desired _interval
.
Common Table Expressions (CTE) for easier handling. Input parameter values once in the top CTE params
.
generate_series()
for intervals to create the time raster.
Window function lead(...)
with 3 parameters (including default) - to cover the special case of last row.
LEFT JOIN
between raster and actual data, so that time frames without matching data will still show in the result (with NULL
values as data).
That's also the reason for a later edit: WHERE
condition had to move to the LEFT JOIN
condition, to achieve that.
WITH RECURSIVE params AS (
SELECT '2012-10-23 01:00:00+05:30'::timestamptz AS _min -- input params
,'2012-10-23 05:59:00+05:30'::timestamptz AS _max
,'2 hours'::interval AS _interval
)
, timeframe AS (
SELECT _min AS t_min, LEAST(_min + _interval, _max) AS t_max
FROM params
UNION ALL
SELECT t_max, LEAST(t_max + _interval, _max)
FROM timeframe t, params p
WHERE t_max < _max
)
SELECT ...
Slightly faster ... take your pick.
-> sqlfiddle displaying both.
Note that you can have non-recursive CTEs (additionally) even when declared WITH RECURSIVE
.
Should be faster than your original query. Half the code deals with generating the time raster, which concerns few rows and is very fast. Handling actual table rows (the expensive part) gets cheaper, because we don't calculate a new value from every sample_time_stamp
any more.
You should definitely have a multi-column index of the form:
CREATE INDEX foo_idx ON sensor_readings (tagid, sample_time_stamp DESC);
I use DESC
on the assumption that you more often query recent entries (later timestamps). Remove the modifier if that's not the case. Doesn't make a big difference either way.
Upvotes: 3