Scott Miller
Scott Miller

Reputation: 332

Select a column from a different row

so I have the following table on AWS Redshift

node_id    power_source    timestamp
----------------------------------------------
108          LINE         2019-09-10 09:15:30
108          BATT         2019-09-10 10:20:15
108          LINE         2019-09-10 13:45:00
108          LINE         2019-09-11 06:00:15
108          BATT         2019-09-12 05:50:15
108          BATT         2019-09-12 12:15:15
108          LINE         2019-09-12 18:45:15
108          LINE         2019-09-13 09:20:15
108          BATT         2019-09-14 11:20:15
108          BATT         2019-09-14 13:30:15
108          BATT         2019-09-14 15:30:15
108          LINE         2019-09-14 16:48:36
108          LINE         2019-09-15 09:20:15

I am trying to figure out how long (cumulative) the node's power_source is on 'BATT'. I am thinking that I could do a datediff on the timestamps, but I would need to get the timestamp of the first 'LINE' row after the 'BATT' row (based on ts). Not really sure how to get that value though. Once I have that, then I could just SUM() the datediff.

Edit:

Here is the expected result

node_id    power_source    timestamp             ts_line_power          ts_diff(in mins)
-----------------------------------------------------------------------------------------
108          BATT         2019-09-10 10:20:15    2019-09-10 13:45:00    205
108          BATT         2019-09-12 05:50:15    2019-09-12 18:45:15    785
108          BATT         2019-09-14 11:20:15    2019-09-14 16:48:36    328

Any help/assistance would be appreciated

Upvotes: 0

Views: 36

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269443

If I understand correctly, you can use lead():

select node_id,
       sum(datediff(minute, timestamp, next_ts)) as diff_in_minutes
from (select t.*,
             lead(timestamp) over (partition by node_id order by timestamp) as next_ts
      from t
     ) t
where power_source = 'BATT'
group by node_id;

This gets the timestamp after the BATT record and uses that to define the end time.

EDIT:

The above is overall for all "BATT"s. You have a group-and-islands problem. For that, you can assign a group by counting the number of non-BATT records greater than each row. This keeps the next record in the group.

This is all window functions and aggregation:

select node_id, min(timestamp), max(timestamp),
       sum(datediff(minute, min(timestamp), max(timestamp))) as diff_in_minutes
from (select t.*,
             sum( (power_source = 'LINE')::int ) over (partition by node_id order by timestamp desc) as grp
      from t
     ) t
group by node_id, grp
having sum( (power_source = 'BATT')::int) > 0;  -- only include rows that have at least one BATT

Note that this assumes that only "LINE" and "BATT" are valid values for the power source.

Upvotes: 2

Related Questions