glennpierce
glennpierce

Reputation: 221

Get difference of another field between first and last timestamps of grouping

I have a very large table called sensor_values with the columns timestamp, value, sensor_id and another table called sensors which has say sensor_id, name.

I often perform a pivot query to get summed data grouped by day like this:

SELECT MIN(to_char(s1.timestamp::timestamptz, 'YYYY-MM-DD HH24:MI:SS TZ')) AS time,
      SUM(CASE WHEN s1.sensor_id = 572 THEN s1.value ELSE 0.0 END) AS "Nickname1",
      SUM(CASE WHEN s1.sensor_id = 542 THEN s1.value ELSE 0.0 END) AS "Nickname2",
      SUM(CASE WHEN s1.sensor_id = 571 THEN s1.value ELSE 0.0 END) AS "Nickname3"
FROM sensor_values s1
WHERE s1.timestamp::timestamptz >= '2013-10-14T00:00:00+00:00'::timestamptz
AND s1.timestamp::timestamptz <= '2013-10-18T00:00:00+00:00'::timestamptz
AND s1.sensor_id IN (572, 542, 571, 540, 541, 573)
GROUP BY date_trunc('day', s1.timestamp) ORDER BY 1 ; 

This works OK if a bit slow. However, is it possible to write a similar query that instead of summing the groups it gets the difference between the latest and earliest timestamps in each grouping, i.e. day in this case?

This is because I have some sensor data that is ever increasing (electrical kwh meter) and would like to know the consumption in a particular time frame.

Upvotes: 0

Views: 931

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657022

Step 1: Release the handbrakes

... if a bit slow

SELECT to_char(MIN(ts)::timestamptz, 'YYYY-MM-DD HH24:MI:SS TZ') AS min_time
      ,SUM(CASE WHEN sensor_id = 572 THEN value ELSE 0.0 END) AS nickname1
      ,SUM(CASE WHEN sensor_id = 542 THEN value ELSE 0.0 END) AS nickname2
      ,SUM(CASE WHEN sensor_id = 571 THEN value ELSE 0.0 END) AS nickname3
FROM   sensor_values
-- LEFT JOIN sensor_values_cleaned s2 USING (sensor_id, ts)
WHERE  ts >= '2013-10-14T00:00:00+00:00'::timestamptz::timestamp
AND    ts <  '2013-10-18T00:00:00+00:00'::timestamptz::timestamp
AND    sensor_id IN (572, 542, 571, 540, 541, 573)
GROUP  BY ts::date AS day
ORDER  BY 1;

Major points

  • Replace reserved words (in standard SQL) in your identifiers.
    timestamp -> ts
    time -> min_time

  • Since the join is on identical column names you can use the simpler USING clause in the join condition: USING (sensor_id, ts)
    However, since the second table sensor_values_cleaned is 100% irrelevant to this query, I removed it entirely.

  • As @joop already advised, switch min() and to_char() in your first out put column. This way, Postgres can determine the minimum from the original column value, which is generally faster and may be able to utilize an index. In this specific case, ordering by date is also cheaper than ordering by a text, which would also have to consider collation rules.

  • A similar consideration applies to your WHERE condition:
    WHERE ts::timestamptz >= '2013-10-14T00:00:00+00:00'::timestamptz

    WHERE  ts >= '2013-10-14T00:00:00+00:00'::timestamptz::timestamp
    

    The second one is sargable and can utilize a plain index on ts - to great effect on performance in big tables!

  • Using ts::date instead of date_trunc('day', ts). Simpler, faster, same result.

  • Most probably your second WHERE condition is slightly incorrect. Generally, you would exclude the upper border:

    AND    ts <=  '2013-10-18T00:00:00+00:00' ...

    AND    ts <   '2013-10-18T00:00:00+00:00' ...
  • When mixing timestamp and timestamptz one needs to be aware of the effects. For instance, your WHERE condition doesn't cut at 00:00 local time (except if local time coincides with UTC). Details here:
    Ignoring timezones altogether in Rails and PostgreSQL

Step 2: Your request

...the difference between the latest and earliest timestamps in each grouping

And by that I suppose you mean:
...the difference between the value of the latest and earliest timestamps ...
Otherwise it would be much simpler.

Use window functions for that, in particular first_value() and last_value(). Careful with the combination, you want a non-standard window frame for last_value() in this case. Compare:
PostgreSQL aggregate or window function to return just the last value

I combine this with DISTINCT ON, which is more convenient in this case than GROUP BY (which would need another subquery level):

SELECT DISTINCT ON (ts::date, sensor_id)
       ts::date AS day
      ,to_char((min(ts)  OVER (PARTITION BY ts::date))::timestamptz
              ,'YYYY-MM-DD HH24:MI:SS TZ') AS min_time
      ,sensor_id
      ,last_value(value)    OVER (PARTITION BY ts::date, sensor_id ORDER BY ts
                     RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
       - first_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts)
                                                                   AS val_range
FROM   sensor_values
WHERE  ts >= '2013-10-14T00:00:00+0'::timestamptz::timestamp
AND    ts <  '2013-10-18T00:00:00+0'::timestamptz::timestamp
AND    sensor_id IN (540, 541, 542, 571, 572, 573)
ORDER  BY ts::date, sensor_id;

-> SQLfiddle demo.

Step 3: Pivot table

Building on the query above I use crosstab() from the additional module tablefunc:

SELECT * FROM crosstab(
   $$SELECT DISTINCT ON (1,3)
            ts::date AS day
           ,to_char((min(ts) OVER (PARTITION BY ts::date))::timestamptz,'YYYY-MM-DD HH24:MI:SS TZ') AS min_time
           ,sensor_id
           ,last_value(value)    OVER (PARTITION BY ts::date, sensor_id ORDER BY ts RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
            - first_value(value) OVER (PARTITION BY ts::date, sensor_id ORDER BY ts) AS val_range
     FROM   sensor_values
     WHERE  ts >= '2013-10-14T00:00:00+0'::timestamptz::timestamp
     AND    ts <  '2013-10-18T00:00:00+0'::timestamptz::timestamp
     AND    sensor_id IN (540, 541, 542, 571, 572, 573)
     ORDER  BY 1, 3$$

   ,$$VALUES (540), (541), (542), (571), (572), (573)$$
   )
AS ct (day date, min_time text, s540 numeric, s541 numeric, s542 numeric, s571 numeric, s572 numeric, s573 numeric);

Returns (and much faster than before):

    day     |         min_time         | s540  | s541  | s542  | s571  | s572  | s573
------------+--------------------------+-------+-------+-------+-------+-------+-------
 2013-10-14 | 2013-10-14 03:00:00 CEST | 18.82 | 18.98 | 19.97 | 19.47 | 17.56 | 21.27
 2013-10-15 | 2013-10-15 00:15:00 CEST | 22.59 | 24.20 | 22.90 | 21.27 | 22.75 | 22.23
 2013-10-16 | 2013-10-16 00:16:00 CEST | 23.74 | 22.52 | 22.23 | 23.22 | 23.03 | 22.98
 2013-10-17 | 2013-10-17 00:17:00 CEST | 21.68 | 24.54 | 21.15 | 23.58 | 23.04 | 21.94

Upvotes: 2

joop
joop

Reputation: 4513

Try replacing

SELECT MIN(to_char(s1.timestamp::timestamptz, 'YYYY-MM-DD HH24:MI:SS TZ')) AS time,

by:

SELECT to_char(MIN(s1.timestamp)::timestamptz, 'YYYY-MM-DD HH24:MI:SS TZ') AS zztime,

or even:

SELECT MIN(s1.timestamp) AS zztime,

since the datetimestampformat you specify is more or less the default

This will avoid the min-selection on a computed expression.

BTW: timestamp and time are both reserved words (type names) in (postgres) SQL. Try to avoid using them as identifiers.

Upvotes: 0

Related Questions