Reputation: 221
I have a very large table called sensor_values
with the columns timestamp
, value
, sensor_id
and another table called sensors
which has say sensor_id
, name
.
I often perform a pivot query to get summed data grouped by day like this:
SELECT MIN(to_char(s1.timestamp::timestamptz, 'YYYY-MM-DD HH24:MI:SS TZ')) AS time,
SUM(CASE WHEN s1.sensor_id = 572 THEN s1.value ELSE 0.0 END) AS "Nickname1",
SUM(CASE WHEN s1.sensor_id = 542 THEN s1.value ELSE 0.0 END) AS "Nickname2",
SUM(CASE WHEN s1.sensor_id = 571 THEN s1.value ELSE 0.0 END) AS "Nickname3"
FROM sensor_values s1
WHERE s1.timestamp::timestamptz >= '2013-10-14T00:00:00+00:00'::timestamptz
AND s1.timestamp::timestamptz <= '2013-10-18T00:00:00+00:00'::timestamptz
AND s1.sensor_id IN (572, 542, 571, 540, 541, 573)
GROUP BY date_trunc('day', s1.timestamp) ORDER BY 1 ;
This works OK if a bit slow. However, is it possible to write a similar query that instead of summing the groups it gets the difference between the latest and earliest timestamps in each grouping, i.e. day in this case?
This is because I have some sensor data that is ever increasing (electrical kwh meter) and would like to know the consumption in a particular time frame.
Upvotes: 0
Views: 931
Reputation: 657022
... if a bit slow
SELECT to_char(MIN(ts)::timestamptz, 'YYYY-MM-DD HH24:MI:SS TZ') AS min_time
,SUM(CASE WHEN sensor_id = 572 THEN value ELSE 0.0 END) AS nickname1
,SUM(CASE WHEN sensor_id = 542 THEN value ELSE 0.0 END) AS nickname2
,SUM(CASE WHEN sensor_id = 571 THEN value ELSE 0.0 END) AS nickname3
FROM sensor_values
-- LEFT JOIN sensor_values_cleaned s2 USING (sensor_id, ts)
WHERE ts >= '2013-10-14T00:00:00+00:00'::timestamptz::timestamp
AND ts < '2013-10-18T00:00:00+00:00'::timestamptz::timestamp
AND sensor_id IN (572, 542, 571, 540, 541, 573)
GROUP BY ts::date AS day
ORDER BY 1;
Replace reserved words (in standard SQL) in your identifiers.
timestamp
-> ts
time
-> min_time
Since the join is on identical column names you can use the simpler USING
clause in the join condition: USING (sensor_id, ts)
However, since the second table sensor_values_cleaned
is 100% irrelevant to this query, I removed it entirely.
As @joop already advised, switch min()
and to_char()
in your first out put column. This way, Postgres can determine the minimum from the original column value, which is generally faster and may be able to utilize an index. In this specific case, ordering by date
is also cheaper than ordering by a text
, which would also have to consider collation rules.
A similar consideration applies to your WHERE
condition:
WHERE ts::timestamptz >= '2013-10-14T00:00:00+00:00'::timestamptz
WHERE ts >= '2013-10-14T00:00:00+00:00'::timestamptz::timestamp
The second one is sargable and can utilize a plain index on ts
- to great effect on performance in big tables!
Using ts::date
instead of date_trunc('day', ts)
. Simpler, faster, same result.
Most probably your second WHERE condition is slightly incorrect. Generally, you would exclude the upper border:
AND ts <= '2013-10-18T00:00:00+00:00' ...
AND ts < '2013-10-18T00:00:00+00:00' ...
When mixing timestamp
and timestamptz
one needs to be aware of the effects. For instance, your WHERE
condition doesn't cut at 00:00 local time (except if local time coincides with UTC). Details here:
Ignoring timezones altogether in Rails and PostgreSQL
...the difference between the latest and earliest timestamps in each grouping
And by that I suppose you mean:
...the difference between the value of the latest and earliest timestamps ...
Otherwise it would be much simpler.
Use window functions for that, in particular first_value()
and last_value()
. Careful with the combination, you want a non-standard window frame for last_value() in this case. Compare:
PostgreSQL aggregate or window function to return just the last value
I combine this with DISTINCT ON
, which is more convenient in this case than GROUP BY
(which would need another subquery level):
SELECT DISTINCT ON (ts::date, sensor_id)
ts::date AS day
,to_char((min(ts) OVER (PARTITION BY ts::date))::timestamptz
,'YYYY-MM-DD HH24:MI:SS TZ') AS min_time
,sensor_id
,last_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
- first_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts)
AS val_range
FROM sensor_values
WHERE ts >= '2013-10-14T00:00:00+0'::timestamptz::timestamp
AND ts < '2013-10-18T00:00:00+0'::timestamptz::timestamp
AND sensor_id IN (540, 541, 542, 571, 572, 573)
ORDER BY ts::date, sensor_id;
Building on the query above I use crosstab()
from the additional module tablefunc
:
SELECT * FROM crosstab(
$$SELECT DISTINCT ON (1,3)
ts::date AS day
,to_char((min(ts) OVER (PARTITION BY ts::date))::timestamptz,'YYYY-MM-DD HH24:MI:SS TZ') AS min_time
,sensor_id
,last_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
- first_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts) AS val_range
FROM sensor_values
WHERE ts >= '2013-10-14T00:00:00+0'::timestamptz::timestamp
AND ts < '2013-10-18T00:00:00+0'::timestamptz::timestamp
AND sensor_id IN (540, 541, 542, 571, 572, 573)
ORDER BY 1, 3$$
,$$VALUES (540), (541), (542), (571), (572), (573)$$
)
AS ct (day date, min_time text, s540 numeric, s541 numeric, s542 numeric, s571 numeric, s572 numeric, s573 numeric);
Returns (and much faster than before):
day | min_time | s540 | s541 | s542 | s571 | s572 | s573
------------+--------------------------+-------+-------+-------+-------+-------+-------
2013-10-14 | 2013-10-14 03:00:00 CEST | 18.82 | 18.98 | 19.97 | 19.47 | 17.56 | 21.27
2013-10-15 | 2013-10-15 00:15:00 CEST | 22.59 | 24.20 | 22.90 | 21.27 | 22.75 | 22.23
2013-10-16 | 2013-10-16 00:16:00 CEST | 23.74 | 22.52 | 22.23 | 23.22 | 23.03 | 22.98
2013-10-17 | 2013-10-17 00:17:00 CEST | 21.68 | 24.54 | 21.15 | 23.58 | 23.04 | 21.94
Upvotes: 2
Reputation: 4513
Try replacing
SELECT MIN(to_char(s1.timestamp::timestamptz, 'YYYY-MM-DD HH24:MI:SS TZ')) AS time,
by:
SELECT to_char(MIN(s1.timestamp)::timestamptz, 'YYYY-MM-DD HH24:MI:SS TZ') AS zztime,
or even:
SELECT MIN(s1.timestamp) AS zztime,
since the datetimestampformat you specify is more or less the default
This will avoid the min-selection on a computed expression.
BTW: timestamp
and time
are both reserved words (type names) in (postgres) SQL. Try to avoid using them as identifiers.
Upvotes: 0