Reputation: 121
I have a table with stop_id
, sched_time
and act_time
and I'm looking to fill in the blanks in the actual times that I have (using linear interpolation) based on the scheduled times (thus preserving relative time between stops). So I would want to go from something like this:
stop_id | sched_time | act_time | actual
------------------------------------------------
001 | 13:47:00 | 13:45:00 | TRUE
002 | 13:50:00 | null | FALSE
003 | 13:52:00 | 13:53:00 | TRUE
004 | 13:59:00 | null | FALSE
005 | 14:01:00 | null | FALSE
006 | 14:04:00 | 14:04:00 | TRUE
To something like this:
stop_id | sched_time | act_time
-------------------------------------
001 | 13:47:00 | 13:45:00
002 | 13:50:00 | 13:49:48
003 | 13:52:00 | 13:53:00
004 | 13:59:00 | 13:59:25
005 | 14:01:00 | 14:01:15
006 | 14:04:00 | 14:04:00
If it's too much to ask to get the interpolation to respect the original timing between stops, a simple linear interpolation on the act_time
column would be a good place to start as there isn't too much variability in the time difference between stops.
Thanks in advance!
Note: The first act_time
can be before the first sched_time
and there may be more than one consecutive row without an actual time.
Upvotes: 3
Views: 831
Reputation: 16397
This is kind of a "third best" solution, in that once you have an actual time, it keeps track of how much ahead of or behind schedule you are and applies that to the nearest scheduled times without actuals:
with q1 as (
select
t.stop_id, sched_time, act_time,
nvl2(act_time, t.sched_time - t.act_time, null) ahead,
sum (nvl2(act_time, 1, 0)) over
(partition by 1 order by stop_id) as actual_count
from schedule t
)
select
stop_id, sched_time,
act_time,
nvl (act_time, sched_time - min (ahead) over
(partition by actual_count)) as act_time2
from q1
The results don't match exactly what you're after, but it may be something you can build upon:
STOP_ID SCHED_TIME ACT_TIME ACT_TIME2
001 13:47 13:45 13:45
002 13:50 13:48
003 13:52 13:53 13:53
004 13:59 14:00
005 14:01 14:02
006 14:04 14:04 14:04
-- 7/24/14 Edit --
Assuming your times are converted to integers as you suggested (30s = 1), I played with this a bit. This is a hideous solution, but I think it does what you suggest. What I'm not sure is if it does it any faster than your procedural loop. I'm curious if it is or not. Oracle's analytic functions are great, but you can see I do use the heck out of them to do what I think you described:
with q1 as (
select
t.stop_id, t.sched_time, t.act_time,
sum (nvl2(act_time, 1, 0)) over
(partition by 1 order by stop_id) as group_id,
lead (sched_time) over (order by stop_id) as next_sched
from schedule2 t
), q2 as (
select
stop_id, sched_time, act_time, group_id, next_sched,
next_sched - sched_time as elapsed,
row_number() over (partition by group_id order by stop_id) as stops,
min (act_time) over (partition by group_id) as min_time,
min (sched_time) over (partition by group_id) as min_sched
from q1
), q3 as (
select
stop_id, sched_time, act_time, group_id, stops, min_time,
min_sched, next_sched,
sum (elapsed) over (partition by group_id order by stop_id) as elapsed,
max (stops) over (partition by group_id) as grp_stops,
lead (min_time, 1) over (order by stop_id) as next_grp_actual,
lead (min_sched, 1) over (order by stop_id) as next_grp_sched
from q2
), q4 as (
select
stop_id, sched_time, act_time, stops, grp_stops,
min_time, lag (elapsed, 1, 0) over
(partition by group_id order by stop_id) as elapsed,
max (next_grp_sched) over (partition by group_id) - min_sched
as time_btw_sched,
max (next_grp_actual) over (partition by group_id) - min_time
as time_btw_actuals
from q3
)
select
stop_id, sched_time, act_time,
nvl (act_time, min_time + (elapsed / time_btw_sched) *
time_btw_actuals) as act_time2
from q4
And here are the results I got from your sample:
id sched actual actual (calc)
001 1654 1650 1650
002 1660 1659.6
003 1664 1666 1666
004 1678 1678.83333333333
005 1682 1682.5
006 1688 1688 1688
I think this could be done a lot cleaner (and more efficiently) in a programming language wrapper. I'm only proficient in C# and Perl, but either of them could do it quite nicely
Upvotes: 1