Pete Ludlow
Pete Ludlow

Reputation: 121

Interpolate missing values in SQL Developer/Oracle 10g

I have a table with stop_id, sched_time and act_time and I'm looking to fill in the blanks in the actual times that I have (using linear interpolation) based on the scheduled times (thus preserving relative time between stops). So I would want to go from something like this:

  stop_id  |  sched_time  |  act_time  |  actual
------------------------------------------------
  001      |  13:47:00    |  13:45:00  |  TRUE
  002      |  13:50:00    |  null      |  FALSE
  003      |  13:52:00    |  13:53:00  |  TRUE
  004      |  13:59:00    |  null      |  FALSE
  005      |  14:01:00    |  null      |  FALSE
  006      |  14:04:00    |  14:04:00  |  TRUE

To something like this:

  stop_id  |  sched_time  |  act_time
-------------------------------------
  001      |  13:47:00    |  13:45:00
  002      |  13:50:00    |  13:49:48
  003      |  13:52:00    |  13:53:00
  004      |  13:59:00    |  13:59:25
  005      |  14:01:00    |  14:01:15
  006      |  14:04:00    |  14:04:00

If it's too much to ask to get the interpolation to respect the original timing between stops, a simple linear interpolation on the act_time column would be a good place to start as there isn't too much variability in the time difference between stops.

Thanks in advance!

Note: The first act_time can be before the first sched_time and there may be more than one consecutive row without an actual time.

Upvotes: 3

Views: 831

Answers (1)

Hambone
Hambone

Reputation: 16397

This is kind of a "third best" solution, in that once you have an actual time, it keeps track of how much ahead of or behind schedule you are and applies that to the nearest scheduled times without actuals:

with q1 as (
  select
    t.stop_id, sched_time, act_time,
    nvl2(act_time, t.sched_time - t.act_time, null) ahead,
    sum (nvl2(act_time, 1, 0)) over
      (partition by 1 order by stop_id) as actual_count
  from schedule t
)
select
  stop_id, sched_time,
  act_time,
  nvl (act_time, sched_time - min (ahead) over
    (partition by actual_count)) as act_time2
from q1

The results don't match exactly what you're after, but it may be something you can build upon:

STOP_ID   SCHED_TIME  ACT_TIME  ACT_TIME2
001       13:47       13:45     13:45
002       13:50                 13:48
003       13:52       13:53     13:53
004       13:59                 14:00
005       14:01                 14:02
006       14:04       14:04     14:04

-- 7/24/14 Edit --

Assuming your times are converted to integers as you suggested (30s = 1), I played with this a bit. This is a hideous solution, but I think it does what you suggest. What I'm not sure is if it does it any faster than your procedural loop. I'm curious if it is or not. Oracle's analytic functions are great, but you can see I do use the heck out of them to do what I think you described:

with q1 as (
  select
    t.stop_id, t.sched_time, t.act_time,
    sum (nvl2(act_time, 1, 0)) over 
        (partition by 1 order by stop_id) as group_id,
    lead (sched_time) over (order by stop_id) as next_sched
  from schedule2 t
), q2 as (
  select
    stop_id, sched_time, act_time, group_id, next_sched,
    next_sched - sched_time as elapsed,
    row_number() over (partition by group_id order by stop_id) as stops,
    min (act_time) over (partition by group_id) as min_time,
    min (sched_time) over (partition by group_id) as min_sched
  from q1
), q3 as (
  select
    stop_id, sched_time, act_time, group_id, stops, min_time,
    min_sched, next_sched,
    sum (elapsed) over (partition by group_id order by stop_id) as elapsed,
    max (stops) over (partition by group_id) as grp_stops,
    lead (min_time, 1) over (order by stop_id) as next_grp_actual,
    lead (min_sched, 1) over (order by stop_id) as next_grp_sched
  from q2
), q4 as (
  select
    stop_id, sched_time, act_time, stops, grp_stops,
    min_time, lag (elapsed, 1, 0) over
      (partition by group_id order by stop_id) as elapsed,
    max (next_grp_sched) over (partition by group_id) - min_sched
        as time_btw_sched,
    max (next_grp_actual) over (partition by group_id) - min_time
        as time_btw_actuals
  from q3
)
select 
  stop_id, sched_time, act_time,
  nvl (act_time, min_time + (elapsed / time_btw_sched) * 
      time_btw_actuals) as act_time2
from q4

And here are the results I got from your sample:

id     sched   actual  actual (calc)
001    1654    1650    1650
002    1660            1659.6
003    1664    1666    1666
004    1678            1678.83333333333
005    1682            1682.5
006    1688    1688    1688

I think this could be done a lot cleaner (and more efficiently) in a programming language wrapper. I'm only proficient in C# and Perl, but either of them could do it quite nicely

Upvotes: 1

Related Questions