Reputation: 623
I need to calclulate time span of employees in oracle database. I need to implement a logic as follows:
Case 1
Jan 1 - Jan 5
Jan 6 - Jan 10
AS the time interval between Jan 5 and Jan 6 is only one day so, the overall output should be
Jan 1 - Jan 10
Case 2
Jan 1 - Jan 5
Jan 7 - Jan 10
AS the time interval between Jan 5 and Jan 7 is more than one day so, the overall output should be
Jan 1 - Jan 5
Jan 7 - Jan 10
There can be any number of rows for each employee. I know it can be done by use of lead/lag functions, but could not get it solved. Can anyone help me??
The sample data I used is as follows:
empid FROMDATE TODATE
===== ======== ======
1 01.01.2013 03.01.2013
1 02.01.2013 05.01.2013
2 01.01.2013 04.01.2013
2 02.01.2013 03.01.2013
2 02.01.2013 06.01.2013
3 01.01.2013 02.01.2013
3 04.01.2013 06.01.2013
3 01.01.2013 04.01.2013
4 01.01.2013 03.01.2013
4 04.01.2013 06.01.2013
5 01.01.2013 06.01.2013
5 01.01.2013 02.01.2013
5 02.01.2013 05.01.2013
5 03.01.2013 04.01.2013
6 01.01.2013 02.01.2013
6 02.01.2013 03.01.2013
6 05.01.2013 06.01.2013
6 05.01.2013 07.01.2013
In case of empid 1-5 the min from date and max to date is gives me the solution, i am stuck up on the cases of empid 6 where the gap is more than 1 day.
Upvotes: 0
Views: 101
Reputation: 191235
The date overlaps, particularly that some date ranges are entirely within others, makes this complicated as Jeffrey Kemp noted. The simplest way to deal with that might be to explode all the ranges into all their individual days, and then combine them back into distinct ranges. One way to explode them, if you have 11gR2, is with recursive subquery factoring (CTE):
with r (empid, onedate, todate) as (
select empid, fromdate, todate
from t42
union all
select empid, onedate + 1, todate
from r
where onedate < todate
)
...
This generates all the dates for all the employees; but it has duplicates because of the overlaps, so you can eliminate those:
...,
s as (
select distinct empid, onedate
from r
)
...
Then you're back to using lead
and lag
to spot the contiguous ranges. This can be compressed a bit but I've left it like this so it's easier (I hope) to follow the logic). First find the previous and next date for the employee:
...,
t as (
select empid, onedate,
lag(onedate) over (partition by empid order by onedate) as lagdate,
lead(onedate) over (partition by empid order by onedate) as leaddate
from s
)
...
And effectively blank out that are mid-range:
...,
u as (
select empid, onedate, lagdate, leaddate,
case when lagdate is null or lagdate < onedate - 1 then onedate end
as fromdate,
case when leaddate is null or leaddate > onedate + 1 then onedate end
as todate
from t
)
...
And finally collapse the calculated rows you have left, using lead
and lag
again - which you can do because the 'from' and 'to' records are adjacent, if we eliminate all the mid-range values:
select distinct empid,
case when fromdate is null then lag(fromdate)
over (partition by empid order by onedate) else fromdate end as fromdate,
case when todate is null then lead(todate)
over (partition by empid order by onedate) else todate end as todate
from u
where fromdate is not null
or todate is not null
order by empid, fromdate;
So putting that all together:
with r (empid, onedate, todate) as (
select empid, fromdate, todate
from t42
union all
select empid, onedate + 1, todate
from r
where onedate < todate
),
s as (
select distinct empid, onedate
from r
),
t as (
select empid, onedate,
lag(onedate) over (partition by empid order by onedate) as lagdate,
lead(onedate) over (partition by empid order by onedate) as leaddate
from s
),
u as (
select empid, onedate, lagdate, leaddate,
case when lagdate is null or lagdate < onedate - 1 then onedate end
as fromdate,
case when leaddate is null or leaddate > onedate + 1 then onedate end
as todate
from t
)
select distinct empid,
case when fromdate is null then lag(fromdate)
over (partition by empid order by onedate) else fromdate end as fromdate,
case when todate is null then lead(todate)
over (partition by empid order by onedate) else todate end as todate
from u
where fromdate is not null
or todate is not null
order by empid, fromdate;
... gives:
EMPID FROMDATE TODATE
---------- ---------- ----------
1 2013-01-01 2013-01-05
2 2013-01-01 2013-01-06
3 2013-01-01 2013-01-06
4 2013-01-01 2013-01-06
5 2013-01-01 2013-01-06
6 2013-01-01 2013-01-03
6 2013-01-05 2013-01-07
7 rows selected
This works in 11.2.0.3, but the recursive CTE seems to give the wrong answer on SQL Fiddle, which is 11.2.0.2 - so not sure if that's seeing a bug. And you can't use it in previous versions anyway. Expanding ranges from multiple rows using connect by
is tricky, and I'm trying to avoid a function, but you can do this instead:
with r as (
select mindate + level - 1 as onedate
from (
select min(fromdate) as mindate, max(todate) as maxdate
from t42
)
connect by level <= maxdate - mindate + 1
),
s as (
select distinct t.empid, r.onedate
from r
join t42 t on r.onedate between t.fromdate and t.todate
)
...
With the rest of the CTEs and the query above, that does work on SQL Fiddle, and produces the same output. And it'll work at least back to 10g. This Fiddle shows it broken down into the stages so you can see how the data is being manipulated at each point.
Upvotes: 1