Martin Ocando
Martin Ocando

Reputation: 924

Finding gaps in records

I have a Oracle table that have placard numbers, timestamps and direction, either IN or OUT. Sometimes, the readers fail, and there are gaps,

For the same placard number I should have:

Placard   Timestamp     Direction
2533      15:36         IN
2533      15:41         OUT

But sometimes, I get either:

Placard   Timestamp     Direction
2533      15:36         IN
2533      15:41         IN
2533      15:49         OUT

Meaning the OUT read of the first IN failed, or

Placard   Timestamp     Direction
2533      15:36         IN
2533      15:41         OUT
2533      15:52         OUT

Meaning the IN read of the second pass, failed.

To fill the gaps, I'll simply insert a record to the result with either +1 minute if is a missing OUT, or -1 minute if is a missing IN, and I'll add a fourth field indicating if is a normal, or error record. For example, the intended output would be:

Placard   Timestamp     Direction   Status
2533      15:36         IN          OK
2533      15:41         OUT         OK
2533      15:51         IN          Error IN
2533      15:52         OUT         OK
2533      15:36         IN          OK
2533      15:37         OUT         Error OUT

I hope I explained myself correctly.

I can't provide any SQL, since I have no idea how to do it. Thank you very much for your help.

Upvotes: 0

Views: 68

Answers (2)

Alex Poole
Alex Poole

Reputation: 191265

You need to identify pairs of IN/OUT values, some of which may be missing. Creating an extra row in the result set isn't simple (at least without hitting the table twice, as Aramillo shows); you can use lead and lag to discover that a row isn't there, but creating it is a different matter.

One approach is to use lead and lag to compress the data you do have into in/out pairs, which would leave you with gaps for the missing values, fill in those gaps, and then expand it again. With slightly modified starting data:

select placard, timestamp, direction
from t42 order by placard, timestamp;

   PLACARD TIMESTAMP DIRECTION
---------- --------- ---------
      2533 15:36     IN        
      2533 15:41     OUT       
      2533 15:52     OUT       
      2533 15:56     IN        
      2533 16:02     IN        
      2533 16:07     OUT       
      2533 16:10     IN        

You can peak behind and ahead:

select placard, timestamp, direction,
  lag(direction) over (partition by placard order by timestamp) as last_dir,
  lead(direction) over (partition by placard order by timestamp) as next_dir,
  lag(timestamp) over (partition by placard order by timestamp) as last_ts,
  lead(timestamp) over (partition by placard order by timestamp) as next_ts
from t42;

   PLACARD TIMESTAMP DIRECTION LAST_DIR NEXT_DIR LAST_TS NEXT_TS
---------- --------- --------- -------- -------- ------- -------
      2533 15:36     IN                 OUT              15:41   
      2533 15:41     OUT       IN       OUT      15:36   15:52   
      2533 15:52     OUT       OUT      IN       15:41   15:56   
      2533 15:56     IN        OUT      IN       15:52   16:02   
      2533 16:02     IN        IN       OUT      15:56   16:07   
      2533 16:07     OUT       IN       IN       16:02   16:10   
      2533 16:10     IN        OUT               16:07           

and use case statements as a kind of manual pivot:

with t as (
  ...
)
select placard, timestamp, direction,
  case when direction = 'IN' then timestamp
    when last_dir is null or last_dir = 'OUT'
    then timestamp - interval '1' minute
    else last_ts end as in_ts,
  case when direction = 'OUT' then timestamp
    when next_dir is null or next_dir = 'IN'
    then timestamp + interval '1' minute
    else next_ts end as out_ts,
  case when direction = 'OUT' and (last_dir is null or last_dir = 'OUT')
    then 'Error IN' else 'OK' end as in_error,
  case when direction = 'IN' and (next_dir is null or next_dir = 'IN')
    then 'Error OUT' else 'OK' end as out_error
from t
order by placard, timestamp;

   PLACARD TIMESTAMP DIRECTION IN_TS OUT_TS IN_ERROR OUT_ERROR
---------- --------- --------- ----- ------ -------- ---------
      2533 15:36     IN        15:36 15:41  OK       OK        
      2533 15:41     OUT       15:36 15:41  OK       OK        
      2533 15:52     OUT       15:51 15:52  Error IN OK        
      2533 15:56     IN        15:56 15:57  OK       Error OUT 
      2533 16:02     IN        16:02 16:07  OK       OK        
      2533 16:07     OUT       16:02 16:07  OK       OK        
      2533 16:10     IN        16:10 16:11  OK       Error OUT 

which now has duplicates for the pairs where you originally had both values, and one-minute-adjusted values for those where you didn't; losing the original direction and timestamp rows you can get down to the core pairs:

with t as (
  ...
)
select distinct placard,
  case when direction = 'IN' then timestamp
    when last_dir is null or last_dir = 'OUT'
    then timestamp - interval '1' minute
    else last_ts end as in_ts,
  case when direction = 'OUT' then timestamp
    when next_dir is null or next_dir = 'IN'
    then timestamp + interval '1' minute
    else next_ts end as out_ts,
  case when direction = 'OUT' and (last_dir is null or last_dir = 'OUT')
    then 'Error IN' else 'OK' end as in_error,
  case when direction = 'IN' and (next_dir is null or next_dir = 'IN')
    then 'Error OUT' else 'OK' end as out_error
from t
order by placard, in_ts;

   PLACARD IN_TS OUT_TS IN_ERROR OUT_ERROR
---------- ----- ------ -------- ---------
      2533 15:36 15:41  OK       OK        
      2533 15:51 15:52  Error IN OK        
      2533 15:56 15:57  OK       Error OUT 
      2533 16:02 16:07  OK       OK        
      2533 16:10 16:11  OK       Error OUT 

And then finally you can unpivot that back to separate rows for the in and out records:

with t as (
  ...
)
select * from (
  select distinct placard,
    case when direction = 'IN' then timestamp
      when last_dir is null or last_dir = 'OUT'
      then timestamp - interval '1' minute
      else last_ts end as in_ts,
    case when direction = 'OUT' then timestamp
      when next_dir is null or next_dir = 'IN'
      then timestamp + interval '1' minute
      else next_ts end as out_ts,
    case when direction = 'OUT' and (last_dir is null or last_dir = 'OUT')
      then 'Error IN' else 'OK' end as in_error,
    case when direction = 'IN' and (next_dir is null or next_dir = 'IN')
      then 'Error OUT' else 'OK' end as out_error
  from t
)
unpivot ((timestamp, error) for direction in
  ((in_ts, in_error) as 'IN', (out_ts, out_error) as 'OUT'))
order by placard, timestamp;

   PLACARD DIRECTION TIMESTAMP ERROR    
---------- --------- --------- ---------
      2533 IN        15:36     OK        
      2533 OUT       15:41     OK        
      2533 IN        15:51     Error IN  
      2533 OUT       15:52     OK        
      2533 IN        15:56     OK        
      2533 OUT       15:57     Error OUT 
      2533 IN        16:02     OK        
      2533 OUT       16:07     OK        
      2533 IN        16:10     OK        
      2533 OUT       16:11     Error OUT 

SQL Fiddle demo, including a second placard.

Upvotes: 2

Aramillo
Aramillo

Reputation: 3216

You need to use analytic function, but also you need union to create new rows. Please take a look at this, i tried to put all posibilities in the test data:

with t(Placard   ,ts  ,Direction   ,Status) as (
select 2533   ,   to_timestamp('15:36','hh24:mi')   ,      'IN'      ,    'OK' from dual union all
select 2533   ,   to_timestamp('15:41','hh24:mi')   ,     'IN'     ,    'OK' from dual union all
select 2533   ,   to_timestamp('15:49','hh24:mi')   ,     'OUT'     ,    'OK' from dual UNION ALL
select 2533   ,   to_timestamp('15:55','hh24:mi')   ,     'OUT'     ,    'OK' from dual UNION ALL
select 2533   ,   to_timestamp('16:00','hh24:mi')   ,     'IN'     ,    'OK' from dual)
--test data  

(SELECT PLACARD,TO_CHAR(CASE WHEN (DIRECTION = 'IN') THEN TS+INTERVAL '1' MINUTE ELSE TS_NEXT-INTERVAL '1' minute end,'HH24:MI') TS,
CASE WHEN DIRECTION = 'IN' THEN 'OUT' ELSE 'IN' END DIRECTION,
'ERROR '||CASE WHEN DIRECTION = 'IN' THEN 'OUT' ELSE 'IN' END STATUS
 FROM 
(select t1.*,lead(TS) over(order by TS) TS_NEXT,
case when (lead(t1.direction) over(order by TS) = direction) or 
(direction = 'IN' AND lead(t1.direction) over(order by TS) IS NULL) then 'Y' else 'N' end WRONG from t t1)
WHERE WRONG = 'Y')
UNION ALL SELECT PLACARD,TO_CHAR(TS,'HH24:MI'),DIRECTION,STATUS FROM T
ORDER BY TS;

Output:

   PLACARD TS    DIRECTION STATUS   
---------- ----- --------- ---------
      2533 15:36 IN        OK       
      2533 15:37 OUT       ERROR OUT
      2533 15:41 IN        OK       
      2533 15:49 OUT       OK       
      2533 15:54 IN        ERROR IN 
      2533 15:55 OUT       OK       
      2533 16:00 IN        OK       
      2533 16:01 OUT       ERROR OUT

8 rows selected.

Upvotes: 2

Related Questions