Reputation: 332
so I have the following table on AWS Redshift
node_id power_source timestamp
----------------------------------------------
108 LINE 2019-09-10 09:15:30
108 BATT 2019-09-10 10:20:15
108 LINE 2019-09-10 13:45:00
108 LINE 2019-09-11 06:00:15
108 BATT 2019-09-12 05:50:15
108 BATT 2019-09-12 12:15:15
108 LINE 2019-09-12 18:45:15
108 LINE 2019-09-13 09:20:15
108 BATT 2019-09-14 11:20:15
108 BATT 2019-09-14 13:30:15
108 BATT 2019-09-14 15:30:15
108 LINE 2019-09-14 16:48:36
108 LINE 2019-09-15 09:20:15
I am trying to figure out how long (cumulative) the node's power_source is on 'BATT'. I am thinking that I could do a datediff on the timestamps, but I would need to get the timestamp of the first 'LINE' row after the 'BATT' row (based on ts). Not really sure how to get that value though. Once I have that, then I could just SUM() the datediff.
Edit:
Here is the expected result
node_id power_source timestamp ts_line_power ts_diff(in mins)
-----------------------------------------------------------------------------------------
108 BATT 2019-09-10 10:20:15 2019-09-10 13:45:00 205
108 BATT 2019-09-12 05:50:15 2019-09-12 18:45:15 785
108 BATT 2019-09-14 11:20:15 2019-09-14 16:48:36 328
Any help/assistance would be appreciated
Upvotes: 0
Views: 36
Reputation: 1269443
If I understand correctly, you can use lead()
:
select node_id,
sum(datediff(minute, timestamp, next_ts)) as diff_in_minutes
from (select t.*,
lead(timestamp) over (partition by node_id order by timestamp) as next_ts
from t
) t
where power_source = 'BATT'
group by node_id;
This gets the timestamp after the BATT record and uses that to define the end time.
EDIT:
The above is overall for all "BATT"s. You have a group-and-islands problem. For that, you can assign a group by counting the number of non-BATT records greater than each row. This keeps the next record in the group.
This is all window functions and aggregation:
select node_id, min(timestamp), max(timestamp),
sum(datediff(minute, min(timestamp), max(timestamp))) as diff_in_minutes
from (select t.*,
sum( (power_source = 'LINE')::int ) over (partition by node_id order by timestamp desc) as grp
from t
) t
group by node_id, grp
having sum( (power_source = 'BATT')::int) > 0; -- only include rows that have at least one BATT
Note that this assumes that only "LINE" and "BATT" are valid values for the power source.
Upvotes: 2