Reputation: 1767
I have a datetime column. I need to derive a column of total minutes elapsed from the first to the last value of every hour grouped by hour, but, in cases of overlapping event, the time should be distributed between two hours. There is also a condition where if the elapsed time exceeds 30 minutes in between two consecutive records, then it has to be ignored.
Below, I've explained in three phases, Original, Intermediate (calculating the running total) and Final.
And, I'm planning to take hourly incremental data on the same, so, how can we properly merge it with the old data is another question.
Sample data:
Moves_TS
1/4/2020 10:00
1/4/2020 10:25
1/4/2020 10:42
1/4/2020 10:56
1/4/2020 10:59
1/4/2020 11:02
1/4/2020 11:24
1/4/2020 11:43
1/4/2020 11:55
1/4/2020 12:26
1/4/2020 12:29
Intermediate layer:
Moves_TS Hour Running Total
1/4/2020 10:00 10 0
1/4/2020 10:25 10 25
1/4/2020 10:42 10 42
1/4/2020 10:56 10 56
1/4/2020 10:59 10 60
1/4/2020 11:02 11 2
1/4/2020 11:24 11 24
1/4/2020 11:43 11 43
1/4/2020 11:55 11 55
1/4/2020 12:26 12 0
1/4/2020 12:29 12 3
Final Output:
Hour Work done/Hour
10 60
11 55
12 3
Upvotes: 0
Views: 372
Reputation: 1270523
This is a gaps-and-islands problem with some twists. First, I would summarize by the "islands" defined by the gaps of 30 minutes:
select min(moves_ts) as start_ts, max(moves_ts) as end_ts
from (select o.*,
count(prev_moves_ts) filter (where moves_ts > prev_moves_ts + interval '30 minute') over (order by moves_ts) as grp
from (select o.*, lag(moves_ts) over (order by moves_ts) as prev_moves_ts
from original o
) o
) o
group by grp;
Then you can use this with generate_series()
to expand the data and calculate the overlaps with each hour:
with islands as (
select min(moves_ts) as start_ts, max(moves_ts) as end_ts
from (select o.*,
count(prev_moves_ts) filter (where moves_ts > prev_moves_ts + interval '30 minute') over (order by moves_ts) as grp
from (select o.*, lag(moves_ts) over (order by moves_ts) as prev_moves_ts
from original o
) o
) o
group by grp
)
select hh.hh,
sum( least(hh.hh + interval '1 hour', i.end_ts) -
greatest(hh.hh, i.start_ts)
) as duration
from (select generate_series(date_trunc('hour', min(moves_ts)),
date_trunc('hour', max(moves_ts)),
interval '1 hour'
) hh
from original o
) hh left join
islands i
on i.start_ts < hh.hh + interval '1 hour' and
i.end_ts >= hh.hh
group by hh.hh
order by hh.hh;
Here is a db<>fiddle.
Upvotes: 1
Reputation: 10035
select
MOVES_TS,
Hour,
TO_CHAR(MOVES_TS,'YYYYMMDDHH') DATEHR,
MIN(Moves_TS) over (partition by DATEHR) as MIN_MOVES_TS,
(
DATE_PART('day', MOVES_TS - MIN_MOVES_TS) * 24 +
DATE_PART('hour', MOVES_TS - MIN_MOVES_TS) * 60 +
DATE_PART('minute', MOVES_TS - MIN_MOVES_TS)
) as RunningTotal
from dataset
Upvotes: 0