Reputation: 11
I have a very large temporal data set showing power outages for the entire US over a 2-year span. I want to calculate the duration of an outage using distinct records from 4 columns. The '0' value for CustomersOut is seen as reset.
Example:
Utility1, State1, County1, City1, 1500, 0, 2017-12-28 12:25:00
Utility1, State1, County1, City1, 1500, 50, 2018-01-01 03:25:00
Utility1, State1, County1, City1, 1500, 65, 2018-01-01 03:36:00
Utility1, State1, County1, City1, 1500, 40, 2018-01-01 03:55:00
Utility1, State1, County1, City1, 1500, 0, 2018-01-01 04:34:00
This means in City1, there were 50 customers out for 11 min, 65 out for 19 min, and 40 out for 39 min, for a total of 55.75 customer hours out.
Desired Result:
Utility1, State1, County1, City1, 1500, 0, 2017-12-28 12:25:00, NULL
Utility1, State1, County1, City1, 1500, 50, 2018-01-01 03:25:00, 00:11.00
Utility1, State1, County1, City1, 1500, 65, 2018-01-01 03:36:00, 00:19.00
Utility1, State1, County1, City1, 1500, 40, 2018-01-01 03:55:00, 00:39.00
Utility1, State1, County1, City1, 1500, 0, 2018-01-01 04:34:00, NULL
Upvotes: 0
Views: 60
Reputation: 222582
You could start with a subquery that computes the difference between the current record and the next one. Then, the outer query filters on records that correspond to power outages and aggregates:
select
UtilityName,
StateName,
CountyName,
CityName,
sum(CustomersOut) CustomersOut,
sum(MinutesOut) MinutesOut,
sum(CustomersOut * MinutesOut) / 60.0 CustomersHoursOut
from (
select
UtilityName,
StateName,
CountyName,
CityName,
CustomersOut,
datediff(
minute,
RecordedDateTime,
lead(RecordedDateTime) over(
partition by UtilityName, StateName, CountyName, CityName
order by RecordedDateTime
)
) MinutesOut
from mytable
) t
where CustomersOut > 0
group by
UtilityName,
StateName,
CountyName,
CityName
UtilityName | StateName | CountyName | CityName | CustomersOut | MinutesOut | CustomersHoursOut :---------- | :-------- | :--------- | :------- | -----------: | ---------: | :------------------ Utility1 | State1 | County1 | City1 | 155 | 69 | 55.750000
EDIT
The information that you want can be generated from the inner query:
select
t.*,
case when CustomersOut > 0
then datediff(
minute,
RecordedDateTime,
lead(RecordedDateTime) over(
partition by UtilityName, StateName, CountyName, CityName
order by RecordedDateTime
)
)
end MinutesOut
from mytable t
Demo:
UtilityName | StateName | CountyName | CityName | CustomersTracked | CustomersOut | RecordedDateTime | MinutesOut :---------- | :-------- | :--------- | :------- | ---------------: | -----------: | :------------------ | ---------: Utility1 | State1 | County1 | City1 | 1500 | 0 | 2017-12-28 12:25:00 | null Utility1 | State1 | County1 | City1 | 1500 | 50 | 2018-01-01 03:25:00 | 11 Utility1 | State1 | County1 | City1 | 1500 | 65 | 2018-01-01 03:36:00 | 19 Utility1 | State1 | County1 | City1 | 1500 | 40 | 2018-01-01 03:55:00 | 39 Utility1 | State1 | County1 | City1 | 1500 | 0 | 2018-01-01 04:34:00 | null
Upvotes: 1