Ayush
Ayush

Reputation: 623

Calculating time spans in oracle

I need to calclulate time span of employees in oracle database. I need to implement a logic as follows:

Case 1

Jan 1 - Jan 5
Jan 6 - Jan 10     

AS the time interval between Jan 5 and Jan 6 is only one day so, the overall output should be

Jan 1 - Jan 10

Case 2

Jan 1 - Jan 5
Jan 7 - Jan 10

AS the time interval between Jan 5 and Jan 7 is more than one day so, the overall output should be

Jan 1 - Jan 5
Jan 7 - Jan 10

There can be any number of rows for each employee. I know it can be done by use of lead/lag functions, but could not get it solved. Can anyone help me??

The sample data I used is as follows:

empid    FROMDATE           TODATE
=====    ========           ======
1   01.01.2013  03.01.2013 
1   02.01.2013  05.01.2013 
2   01.01.2013  04.01.2013 
2   02.01.2013  03.01.2013 
2   02.01.2013  06.01.2013 
3   01.01.2013  02.01.2013 
3   04.01.2013  06.01.2013 
3   01.01.2013  04.01.2013 
4   01.01.2013  03.01.2013 
4   04.01.2013  06.01.2013 
5   01.01.2013  06.01.2013 
5   01.01.2013  02.01.2013 
5   02.01.2013  05.01.2013 
5   03.01.2013  04.01.2013 
6   01.01.2013  02.01.2013 
6   02.01.2013  03.01.2013 
6   05.01.2013  06.01.2013 
6   05.01.2013  07.01.2013 

In case of empid 1-5 the min from date and max to date is gives me the solution, i am stuck up on the cases of empid 6 where the gap is more than 1 day.

Upvotes: 0

Views: 101

Answers (1)

Alex Poole
Alex Poole

Reputation: 191235

The date overlaps, particularly that some date ranges are entirely within others, makes this complicated as Jeffrey Kemp noted. The simplest way to deal with that might be to explode all the ranges into all their individual days, and then combine them back into distinct ranges. One way to explode them, if you have 11gR2, is with recursive subquery factoring (CTE):

with r (empid, onedate, todate) as (
  select empid, fromdate, todate
  from t42
  union all
  select empid, onedate + 1, todate
  from r
  where onedate < todate
)
...

This generates all the dates for all the employees; but it has duplicates because of the overlaps, so you can eliminate those:

..., 
s as (
  select distinct empid, onedate
  from r
)
...

Then you're back to using lead and lag to spot the contiguous ranges. This can be compressed a bit but I've left it like this so it's easier (I hope) to follow the logic). First find the previous and next date for the employee:

...,
t as (
  select empid, onedate,
    lag(onedate) over (partition by empid order by onedate) as lagdate,
    lead(onedate) over (partition by empid order by onedate) as leaddate
  from s
)
...

And effectively blank out that are mid-range:

...,
u as (
  select empid, onedate, lagdate, leaddate,
    case when lagdate is null or lagdate < onedate - 1 then onedate end
      as fromdate,
    case when leaddate is null or leaddate > onedate + 1 then onedate end
      as todate
  from t
)
...

And finally collapse the calculated rows you have left, using lead and lag again - which you can do because the 'from' and 'to' records are adjacent, if we eliminate all the mid-range values:

select distinct empid,
  case when fromdate is null then lag(fromdate)
    over (partition by empid order by onedate) else fromdate end as fromdate,
  case when todate is null then lead(todate)
    over (partition by empid order by onedate) else todate end as todate
from u
where fromdate is not null
or todate is not null
order by empid, fromdate;

So putting that all together:

with r (empid, onedate, todate) as (
  select empid, fromdate, todate
  from t42
  union all
  select empid, onedate + 1, todate
  from r
  where onedate < todate
),
s as (
  select distinct empid, onedate
  from r
),
t as (
  select empid, onedate,
    lag(onedate) over (partition by empid order by onedate) as lagdate,
    lead(onedate) over (partition by empid order by onedate) as leaddate
  from s
),
u as (
  select empid, onedate, lagdate, leaddate,
    case when lagdate is null or lagdate < onedate - 1 then onedate end
      as fromdate,
    case when leaddate is null or leaddate > onedate + 1 then onedate end
      as todate
  from t
)
select distinct empid,
  case when fromdate is null then lag(fromdate)
    over (partition by empid order by onedate) else fromdate end as fromdate,
  case when todate is null then lead(todate)
    over (partition by empid order by onedate) else todate end as todate
from u
where fromdate is not null
or todate is not null
order by empid, fromdate;

... gives:

     EMPID FROMDATE   TODATE   
---------- ---------- ----------
         1 2013-01-01 2013-01-05 
         2 2013-01-01 2013-01-06 
         3 2013-01-01 2013-01-06 
         4 2013-01-01 2013-01-06 
         5 2013-01-01 2013-01-06 
         6 2013-01-01 2013-01-03 
         6 2013-01-05 2013-01-07 

 7 rows selected 

This works in 11.2.0.3, but the recursive CTE seems to give the wrong answer on SQL Fiddle, which is 11.2.0.2 - so not sure if that's seeing a bug. And you can't use it in previous versions anyway. Expanding ranges from multiple rows using connect by is tricky, and I'm trying to avoid a function, but you can do this instead:

with r as (
  select mindate + level - 1 as onedate
  from (
    select min(fromdate) as mindate, max(todate) as maxdate
    from t42
  )
  connect by level <= maxdate - mindate + 1
),
s as (
  select distinct t.empid, r.onedate
  from r
  join t42 t on r.onedate between t.fromdate and t.todate
)
...

With the rest of the CTEs and the query above, that does work on SQL Fiddle, and produces the same output. And it'll work at least back to 10g. This Fiddle shows it broken down into the stages so you can see how the data is being manipulated at each point.

Upvotes: 1

Related Questions