Reputation: 17
I'm working with Oracle and have sample data to show you regarding my question (sample data is below).
I'm trying to figure out how to derive years of service for employees that have terminated and returned several times (up to 6 times). The business rules are that if someone leaves and returns before 365 days, then we bridge their service -- otherwise we don't. Bridging is determined by the "Yes" and "No" values in the BRIDGE fields.
My sample data below: The HIRE, TERM and BRIDGE fields go up to 3 levels but it can at times go all the way up to 6 levels deep for some employees.
╔════════╦════════════╦════════════╦══════════╦════════════╦═══════════╦══════════╦═══════════╗
║ EMPLID ║ HIRE_1 ║ TERM_1 ║ BRIDGE_1 ║ HIRE_2 ║ TERM_2 ║ BRIDGE_2 ║ HIRE_3 ║
╠════════╬════════════╬════════════╬══════════╬════════════╬═══════════╬══════════╬═══════════╣
║ 1001 ║ 9/22/2011 ║ 12/9/2011 ║ Yes ║ 7/1/2012 ║ 7/2/2013 ║ Yes ║ 5/3/2014 ║
║ 1002 ║ 3/29/1999 ║ 6/30/1999 ║ Yes ║ 1/24/2000 ║ 3/29/2004 ║ No ║ 11/5/2007 ║
║ 1003 ║ 8/16/2009 ║ 12/31/2009 ║ Yes ║ 1/1/2010 ║ 5/15/2012 ║ Yes ║ 9/5/2012 ║
║ 1004 ║ 7/23/2007 ║ 8/21/2009 ║ Yes ║ 3/1/2010 ║ 2/9/2011 ║ Yes ║ 4/25/2011 ║
║ 1005 ║ 5/11/2010 ║ 8/31/2010 ║ Yes ║ 7/1/2011 ║ 5/5/2012 ║ Yes ║ 9/6/2012 ║
║ 1006 ║ 1/17/2009 ║ 5/8/2009 ║ Yes ║ 12/8/2009 ║ 5/4/2010 ║ Yes ║ 1/18/2011 ║
║ 1007 ║ 9/20/2004 ║ 5/4/2007 ║ No ║ 11/15/2010 ║ 1/21/2011 ║ Yes ║ 2/16/2011 ║
║ 1008 ║ 6/22/2009 ║ 8/29/2009 ║ No ║ 5/9/2011 ║ 4/13/2012 ║ Yes ║ 5/21/2012 ║
║ 1009 ║ 10/13/2008 ║ 5/9/2009 ║ Yes ║ 8/24/2009 ║ 5/25/2010 ║ Yes ║ 8/10/2010 ║
║ 1010 ║ 8/13/2008 ║ 5/7/2010 ║ Yes ║ 1/4/2011 ║ 5/31/2011 ║ Yes ║ 8/15/2011 ║
║ 5011 ║ 7/3/1985 ║ 10/24/2000 ║ No ║ 7/19/2010 ║ ║ No ║ ║
╚════════╩════════════╩════════════╩══════════╩════════════╩═══════════╩══════════╩═══════════╝
For the last entry, 5011, the employee was rehired on 7/19/2010 and hasn't left. So the calculation should be a little less than 5 years (4.xx). Because the prior termination date between the next hire date was more than 365 days, bridge_1
is "No". So this resets years recognized for service. Otherwise if the employee was rehired less than 365 days from the prior term, then the bridge_1
would be yes.
I'm looking for the most recent period of bridged employment, so if there are two periods separated by a gap of more than a year (as for employee 5011) then I only want the most recent, not the total of all the separate contiguous (bridged) periods.
Upvotes: 0
Views: 400
Reputation: 191275
If you were working with the underlying data rather than the pivoted view you could use analytics and a gaps-and-isands approach. It's overkill here as @PonderStibbons pointed out1 but you could unpivot what you have back to something close to its original form:
with cte1 as (
select t.emplid, t.num, t.hire, nvl(t.tmp_term, sysdate) as term,
bridge as prev_bridge,
nvl(lead(bridge) over (partition by emplid order by num), 'No') as next_bridge
from (select t42.*, null as term_3, 'No' as bridge_0 from t42)
unpivot ((hire, tmp_term, bridge) for num in (
(hire_1, term_1, bridge_0) as 1,
(hire_2, term_2, bridge_1) as 2,
(hire_3, term_3, bridge_2) as 3)) t
)
select * from cte1;
EMPLID NUM HIRE TERM PRE NEX
---------- ---------- ---------- ---------- --- ---
1001 1 2011-09-22 2011-12-09 No Yes
1001 2 2012-07-01 2013-07-02 Yes Yes
1001 3 2014-05-03 2015-06-16 Yes No
1002 1 1999-03-29 1999-06-30 No Yes
1002 2 2000-01-24 2004-03-29 Yes No
1002 3 2007-11-05 2015-06-16 No No
...
This adds dummy term_3
and bridge_0
columns to the table; then unpivots it so you get one row for each set of entires; then it defaults the last open-ended hire to end at today's date, on the assumption you want their total hire time up to today. Because the default behaviour of unpivot is to exclude nulls, this will not create rows for unused 'levels', just up to the last hire date.
I've only used the three sets, but you can easily extend this to all six by changing the dummy column names to term_6
etc. and adding more in
clauses.
From that you can see the contiguous periods, and can ignore any row where which is bridging both sides - where prev_bridge
and next_bridge
are both Yes. (Normally you'd use both lead and lag to work this out from the dates, but might as well use the bridge flags we already have).
with cte1 as (
...
),
cte2 as (
select emplid, num,
case when prev_bridge = 'No' then hire end as hire,
case when next_bridge = 'No' then term end as term
from cte1
where prev_bridge = 'No' or next_bridge = 'No'
)
select * from cte2
order by emplid, num;
EMPLID NUM HIRE TERM
---------- ---------- ---------- ----------
1001 1 2011-09-22
1001 3 2015-06-16
1002 1 1999-03-29
1002 2 2004-03-29
1002 3 2007-11-05 2015-06-16
...
From that you can get the total period covered by each contiguous group, by using lead and lag to compress the periods split across rows (with null hire or term), and using distinct since that will generate duplicates:
with cte1 as (
...
),
cte2 as (
...
),
cte3 as (
select distinct emplid,
case when hire is null then lag(num)
over (partition by emplid order by num) else num end as num,
months_between(case when term is null then lead(term)
over (partition by emplid order by num) else term end,
case when hire is null then lag(hire)
over (partition by emplid order by num) else hire end) as period_months
from cte2
)
select emplid, period_months
from cte3
order by emplid, num;
EMPLID PERIOD_MONTHS
---------- -------------
1001 44.83
1002 60.00
1002 91.38
...
And finally sum those distinct periods:
with cte1 as (
select t.emplid, t.num, t.hire,
nvl(t.tmp_term, sysdate) as term, bridge as prev_bridge,
nvl(lead(bridge) over (partition by emplid order by num), 'No') as next_bridge
from (select t42.*, null as term_3, 'No' as bridge_0 from t42)
unpivot ((hire, tmp_term, bridge) for num in (
(hire_1, term_1, bridge_0) as 1,
(hire_2, term_2, bridge_1) as 2,
(hire_3, term_3, bridge_2) as 3)) t
),
cte2 as (
select emplid, num,
case when prev_bridge = 'No' then hire end as hire,
case when next_bridge = 'No' then term end as term
from cte1
where prev_bridge = 'No' or next_bridge = 'No'
),
cte3 as (
select distinct emplid,
case when hire is null then lag(num)
over (partition by emplid order by num) else num end as num,
months_between(case when term is null then lead(term)
over (partition by emplid order by num) else term end,
case when hire is null then lag(hire)
over (partition by emplid order by num) else hire end) as period_months
from cte2
)
select emplid, sum(period_months) as total_months,
round(sum(period_months)/12, 2) as total_years
from cte3
group by emplid
order by emplid;
EMPLID TOTAL_MONTHS TOTAL_YEARS
---------- ------------ -----------
1001 44.83 3.74
1002 151.38 12.62
1003 70.00 5.83
...
You said you wanted the answer in years, so you can divide by 12, and optionally truncate (or round, or floor, or ceil); you may want to do that before you sum or after you sum, which may give different results sometimes, but I've done if after here just as a demo.
SQL Fiddle with full results, including for a row that only has the first hire date, and a row with two gaps.
If you're only interested in the most recent contiguous/bridged period, this can be simplified to:
with cte1 as (
select t.emplid, t.num,
case when t.bridge = 'No' then t.hire else null end as hire,
case when t.hire is not null and t.term is null
then sysdate else t.term end as term
from (select emplid, 'No' as bridge_0, hire_1, term_1,
nvl2(hire_2, bridge_1, null) as bridge_1, hire_2, term_2,
nvl2(hire_3, bridge_2, null) as bridge_2, hire_3, null as term_3
from t42
)
unpivot ((hire, term, bridge) for num in (
(hire_1, term_1, bridge_0) as 1,
(hire_2, term_2, bridge_1) as 2,
(hire_3, term_3, bridge_2) as 3)) t
)
select emplid,
round(months_between(max(term), max(hire)) / 12, 2) as total_years
from cte1
group by emplid
order by emplid;
EMPLID TOTAL_YEARS
---------- -----------
1001 3.74
1002 7.62
1003 5.84
1004 7.90
1005 5.10
1006 6.42
1007 4.59
1008 4.11
1009 6.68
1010 6.85
1011 3.31
1012 1.88
5011 4.91
1 - don't look at the edit history...
Upvotes: 1
Reputation: 191275
As @PonderStibbons pointed out, this can be done quite simply by adding together the span of each hired period, and also adding the span of the between-hiring periods if the bridge value is 'Yes'. Open-ended hirings and different numbers of hirings can be handled by treating all null dates as the same, presumably as sysdate if you want the current hiring period to be counted up to today.
You can either calculate as days and divide by a nominal value for the number of days in the year (again as Ponder showed!), or use month subtraction and divide by 12:
select emplid,
round((months_between(nvl(term_1, sysdate), hire_1)
+ case when bridge_1 = 'Yes' then months_between(hire_2, term_1) else 0 end
+ months_between(nvl(term_2, sysdate), nvl(hire_2, sysdate))
+ case when bridge_2 = 'Yes' then months_between(hire_3, term_2) else 0 end
+ months_between(sysdate, nvl(hire_3, sysdate))) / 12, 2) as total_years
from t42
order by emplid;
EMPLID TOTAL_YEARS
---------- -----------
1001 3.74
1002 12.62
1003 5.84
1004 7.90
1005 5.10
1006 6.42
1007 7.21
1008 4.29
1009 6.68
1010 6.85
1011 3.31
1012 2.21
5011 20.22
For all six potential hiring periods, repeat the pattern; only the first and last clauses in the addition will be different, as hire_1 is always present, and term_6 won't exist.
If you only want the most recent contiguous/bridged period, not the total of all contiguous/bridged periods, you need to be more selective about which periods are included - starting from the most recent and working backwards, taking all later bridges into account:
select emplid,
round((
-- 3rd period
months_between(sysdate, nvl(hire_3, sysdate))
-- gap between 3rd and 2nd
+ case when (hire_3 is not null and bridge_2 = 'Yes') then
months_between(hire_3, term_2) else 0 end
-- 2nd period
+ case when (hire_3 is null or bridge_2 = 'Yes') then
months_between(nvl(term_2, sysdate), nvl(hire_2, sysdate)) else 0 end
-- gap between 2nd and 1st
+ case when (hire_3 is not null and bridge_2 = 'Yes')
and (hire_1 is not null and bridge_1 = 'Yes') then
months_between(hire_2, term_1) else 0 end
-- 1st period
+ case when (hire_3 is null or bridge_2 = 'Yes')
and (hire_2 is null or bridge_1 = 'Yes') then
months_between(nvl(term_1, sysdate), nvl(hire_1, sysdate)) else 0 end
) / 12, 2) as total_years
from t42
order by emplid;
EMPLID TOTAL_YEARS
---------- -----------
1001 3.74
1002 7.62
1003 5.84
1004 7.90
1005 5.10
1006 6.42
1007 4.59
1008 4.11
1009 6.68
1010 6.85
1011 3.31
1012 1.88
5011 4.91
SQL Fiddle, including a breakdown for each period/gap if it is should be included.
Upvotes: 0