Descagnia
Descagnia

Reputation: 11

Multiple Records Identifying Problem Rows

My table from PeopleSoft PS_JOB has ID Numbers (my example, only one emplid 12345).

The IDs can have multiple employee records (the example has 3 empl_rcd total: 0, 1, 2). This means the employee has three jobs within the organization.

Each EMPL_RCD has various action-action_reason combinations that represent HR changes (things like rehire, term, pay rate increase, etc).

The HR changes are effective dated. The HR changes are represented in rows with an EFFDT (start date) and an END_DATE (end date) of that HR change.

Empl_rcd 0 has 10 rows worth of HR changes.

Empl_rcd 1 has 6 rows worth of HR changes.

Empl_Rcd 2 has 2 rows worth of HR changes.

The empl_Rcds can be active or terminated.

I have the data sorted by EFFDT so we can see the order of HR changes.

Initially, empl_Rcd 0 was the only record that existed. Empl_rcd 0 was hired (HIR-INH) EFFDT 6/14/21. Then, 0 had a pay rate change (PAY-SPI) EFFDT 3/1/22. Then, 0 was terminated (TER-FXD) EFFDT 9/1/22.

Then, 0 was rehired (REH-REH) EFFDT 11/1/22. This is important. This row has the correct ACTION_REASON value of "REH". Since at this point in time, there are no other active empl_rcds, the correction action_Reason to use is "REH." Together with the action, the values are REH-REH.

Then, 0 has a pay rate change (PAY-CWP) EFFDT 2/1/23.

Then, a new empl_rcd is added--empl_Rcd = 1. This is important. Empl_rcd 1 is rehired (REH) with action_reason "ADL" EFFDT 3/1/22. The action action_Reason combo of REH-ADL is correct because at this point in time, empl_rcd = 0 was active. Anytime there is an active job and someone is rehiring another job, the action_Reason to use must be 'ADL' (not REH).

Empl_Rcd 1 has a PAY-CWP EFFDT 5/1/23. Empl_Rcd 1 is terminated (TER-CTG) EFFDT 7/1/23.

Then, this is important, empl_rcd = 1 gets rehired incorrectly. Empl_rcd = 1 has action REH with action_Reason "REH" EFFDT 10/1/23. This is wrong. Because empl_rcd = 0 was active at this point in time, this row should have been REH-ADL (not REH-REH).

These are the problem rows I'm trying to identify.

I need help identifying certain REH rows that need the action_Reason fixed.

Rows 11 and 16 are both wrong and should be ADL since there was always at least one active record before the rehire. These are the two rows I'd like to select in a SQL.

I can't figure out how to isolate the problem rows. I was attempting using the EFFDT and END_DATE but I'm not anywhere near figuring this out.

Maybe lag or lead would help?

Identifying rows within multiple date ranges is something I've never been able to select...

EMPLID EMPL_RCD EFFDT END_DATE EFFSEQ ACTION ACTION_REASON EMPL_STATUS
12345 0 6/14/21 3/1/22 0 HIR INH A
12345 0 3/01/22 9/1/22 0 PAY SPI A
12345 0 9/01/22 11/1/22 0 TER CTG T
12345 0 11/1/22 2/1/23 0 REH REH A
12345 0 2/1/23 4/1/23 0 PAY CWP A
12345 1 3/1/23 5/1/23 0 REH ADL A
12345 0 4/1/23 8/16/23 0 TER CTG T
12345 1 5/1/23 7/1/23 0 PAY CWP A
12345 1 7/1/23 10/1/23 0 TER CTG T
12345 0 8/16/23 12/1/23 0 REH REH A
12345 1 10/1/23 10/16/23 0 REH REH A
12345 1 10/16/23 12/16/23 0 PAY CWP A
12345 0 12/1/23 3/1/24 0 TER CTG T
12345 2 12/1/23 4/16/24 0 REH ADL A
12345 1 12/16/23 1/1/50 0 TER CTG T
12345 0 3/1/24 6/16/24 0 REH REH A**
12345 2 4/16/24 1/1/50 0 TER CTG T
12345 0 6/14/21 1/1/50 0 TER CTG T

All the selects I've been trying haven't worked :( They're not even worth sharing.

Update: user d s gave me a great SQL. I ran it in my database, and two of the rows have the wrong error message: output from the SQL d s gave

Upvotes: 1

Views: 122

Answers (2)

shawnt00
shawnt00

Reputation: 17915

EDIT after clarifications

Presumably you use Oracle . You'll be looking at the most recent action of each employee record to see if any weren't termination. I'm also presuming that 'TER' will be the final entry in such cases. You might need to limit the set of actions if that's not reliably true.

The inner query computes a list of the IsTerminated statuses (Y/N) for each record up each date. The outer grouping min(IsTerminated) determines whether any of them are 'N' aka "Active" (falling earlier in the alphabet.) You can remove the translation from T/A -> Y/N if preferred.

select
    EMPLID, EMPL_RCD, EFFDT,
    min(ACTION) as ACTION, min(ACTION_REASON) as ACTION_REASON,
    min(IsTerminated) as IsRehire,
    case when min(ACTION_REASON) = 'REH' and min(IsTerminated) = 'N' or
              min(ACTION_REASON) = 'ADL' and min(IsTerminated) = 'Y' then '!!' else '' end as IsError
from PS_Job j cross apply (
    select
        case first_value(ACTION) over (partition by EMPLID, EMPL_RCD order by EFFDT desc)
            when 'TER' then 'Y' else 'N' end as IsTerminated
    from PS_Job j2
    where j2.EMPLID = j.EMPLID and j2.EFFDT < j.EFFDT
) terminations
where ACTION = 'REH'
group by EMPLID, EMPL_RCD, EFFDT
order by EMPLID, EFFDT;

https://dbfiddle.uk/nuuobi3V

There might be a slightly cleaner way so I'll think about it some more. You should also specify desired output if this doesn't work for you.

Cleaner version?

This second variation pushes the aggregation down into the subquery so that the top level is cleaner. Otherwise it works basically the same way. Another difference is using an outer apply so that all rows are returning while pushing down the filter on 'REH' so that only those rows get the lookup applied.

select
    EMPLID, EMPL_RCD, EFFDT, ACTION, ACTION_REASON,
    IsRehire,
    case when ACTION_REASON = 'REH' and IsRehire = 'N' or
              ACTION_REASON = 'ADL' and IsRehire = 'Y' then '!!' else '' end as IsError
from PS_Job j outer apply (
    select min(IsTerminated) as IsRehire from (
        select
            case first_value(ACTION) over (partition by EMPLID, EMPL_RCD order by EFFDT desc)
                when 'TER' then 'Y' else 'N' end as IsTerminated
        from PS_Job j2
        where j2.EMPLID = j.EMPLID and j2.EFFDT < j.EFFDT and j.ACTION = 'REH'
    ) t0
) t 
order by EMPLID, EFFDT;

https://sqlfiddle.com/oracle/online-compiler?id=d5d2ce4b-b586-46d1-83be-b8853d823460

Original attempt

I don't think your requirement is fully clear--for one I cannot make sense of the "all-records-that-exist-have-status-T stipulation. But I think this will help get you started as it will bundle each set of rows between 'REH' actions. Using the counts I think you can derive the rest of what you need:

with grp as (
    select *,
        count(case when action = 'REH' then 1 end)
             over (partition by emplid order by effdt desc) as grp
    from PS_Job
)
select emplid, min(effdt) as start_date, max(end_date) as end_date)
    count(case when empl_status = 'A' then 1 end) as A,
    count(case when empl_status = 'T' then 1 end) as T
where grp > 0
group by emplid, grp
order by emplid, grp desc;

Another idea is that you may need to both partition and group with the empl_rcd column. You haven't fully explained what that's for either.

Upvotes: 0

d r
d r

Reputation: 7758

This was a bit tricky to understand and if I got it wright you could try this:

--      S a m p l e    D a t a :
Create Table PS_JOB As
  ( Select  12345 "EMPLID", 0 "EMPL_RCD", To_Date('6/14/21', 'mm/dd/yy') "EFFDT", To_Date('3/1/22', 'mm/dd/yy') "END_DATE", 0 "EFFSEQ", 'HIR' "ACTION", 'INH' "ACTION_REASON", 'A' "EMPL_STATUS" From Dual Union All 
    Select  12345,  0,  To_Date('3/01/22',  'mm/dd/yy'), To_Date('9/1/22',   'mm/dd/yy'),  0,   'PAY',  'SPI',  'A' From Dual Union All 
    Select  12345,  0,  To_Date('9/01/22',  'mm/dd/yy'), To_Date('11/1/22',   'mm/dd/yy'), 0,   'TER',  'CTG',  'T' From Dual Union All 
    Select  12345,  0,  To_Date('11/1/22',  'mm/dd/yy'), To_Date('2/1/23',    'mm/dd/yy'), 0,   'REH',  'REH',  'A' From Dual Union All 
    Select  12345,  0,  To_Date('2/1/23',   'mm/dd/yy'), To_Date('4/1/23',    'mm/dd/yy'), 0,   'PAY',  'CWP',  'A' From Dual Union All 
    Select  12345,  1,  To_Date('3/1/23',   'mm/dd/yy'), To_Date('5/1/23',    'mm/dd/yy'), 0,   'REH',  'ADL',  'A' From Dual Union All 
    Select  12345,  0,  To_Date('4/1/23',   'mm/dd/yy'), To_Date('8/16/23',   'mm/dd/yy'), 0,   'TER',  'CTG',  'T' From Dual Union All 
    Select  12345,  1,  To_Date('5/1/23',   'mm/dd/yy'), To_Date('7/1/23',    'mm/dd/yy'), 0,   'PAY',  'CWP',  'A' From Dual Union All 
    Select  12345,  1,  To_Date('7/1/23',   'mm/dd/yy'), To_Date('10/1/23',   'mm/dd/yy'), 0,   'TER',  'CTG',  'T' From Dual Union All 
    Select  12345,  0,  To_Date('8/16/23',  'mm/dd/yy'), To_Date('12/1/23',   'mm/dd/yy'), 0,   'REH',  'REH',  'A' From Dual Union All 
    Select  12345,  1,  To_Date('10/1/23',  'mm/dd/yy'), To_Date('10/16/23',  'mm/dd/yy'), 0,   'REH',  'REH',  'A' From Dual Union All 
    Select  12345,  1,  To_Date('10/16/23', 'mm/dd/yy'), To_Date('12/16/23',  'mm/dd/yy'), 0,   'PAY',  'CWP',  'A' From Dual Union All 
    Select  12345,  0,  To_Date('12/1/23',  'mm/dd/yy'), To_Date('3/1/24',    'mm/dd/yy'), 0,   'TER',  'CTG',  'T' From Dual Union All 
    Select  12345,  2,  To_Date('12/1/23',  'mm/dd/yy'), To_Date('4/16/24',   'mm/dd/yy'), 0,   'REH',  'ADL',  'A' From Dual Union All 
    Select  12345,  1,  To_Date('12/16/23', 'mm/dd/yy'), To_Date('1/1/50',    'mm/dd/yy'), 0,   'TER',  'CTG',  'T' From Dual Union All 
    Select  12345,  0,  To_Date('3/1/24',   'mm/dd/yy'), To_Date('6/16/24',   'mm/dd/yy'), 0,   'REH',  'REH',  'A**' From Dual Union All 
    Select  12345,  2,  To_Date('4/16/24',  'mm/dd/yy'), To_Date('1/1/50',    'mm/dd/yy'), 0,   'TER',  'CTG',  'T' From Dual Union All 
    Select  12345,  0,  To_Date('6/14/21',  'mm/dd/yy'), To_Date('1/1/50',    'mm/dd/yy'), 0,   'TER',  'CTG',  'T' From Dual 
  );

... first create a cte that will prepare your data for Case expression conditions in the main sql. You need to get in the same row EFFDT and END_DATE of the termination. I used Max() Over() analytic function with propper Partition By, Order By and windowing part.

--    P r e p a r i n g    T h e   D a t a :
WITH 
  grid AS
    ( Select     ROWNUM "RN", EMPLID, EMPL_RCD, EFFDT, END_DATE, EFFSEQ, ACTION, ACTION_REASON, EMPL_STATUS, 
                 CASE WHEN ACTION != 'TER' 
                      THEN Max(Case When EMPL_STATUS = 'T' Then EFFDT End) 
                           Over( Partition By EMPLID, EMPL_RCD Order By EMPLID, EMPL_RCD, EFFDT, END_DATE  
                                 Rows Between Unbounded Preceding And 1 Preceding ) 
                 ELSE EFFDT
                 END "EFFDT_TERMINATED", 
                 CASE WHEN ACTION != 'TER' 
                      THEN Max(Case When EMPL_STATUS = 'T' Then END_DATE End) 
                           Over( Partition By EMPLID, EMPL_RCD Order By EMPLID, EMPL_RCD, EFFDT, END_DATE  
                                 Rows Between Unbounded Preceding And 1 Preceding ) 
                 ELSE END_DATE
                 END "END_DATE_TERMINATED" 
     From       PS_JOB 
    )

... use Case expression conditions to check if the ACTION_REASON column is ok or not for ACTION = 'REH'. Keep in mind that Case works sequentialy (first When condition true returns the Then value and exits Case)...

--    M a i n    S Q L :
SELECT   EMPLID, EMPL_RCD, EFFDT, END_DATE, EFFSEQ, ACTION, ACTION_REASON, EMPL_STATUS, 
         Case When (STAT = 'A' And ACTION = 'REH' And ACTION_REASON != 'ADL')
                  OR 
                   (STAT = 'T' And ACTION = 'REH' And ACTION_REASON != 'ADL' And
                      Case When Max(Case When STAT = 'A' Then END_DATE End) 
                                 Over(Partition By EMPLID) Between EFFDT And END_DATE
                           Then 1
                      End = 1
                   )
              Then 'ERR - ACTION_REASON should be ADL - there are active EMPLIDs'
              --
              When STAT = 'T' And ACTION = 'REH' And ACTION_REASON != 'REH'
              Then 'ERR - ACTION_REASON should be REH - there are no active EMPLIDs'
              When STAT = 'T' And ACTION = 'REH' And ACTION_REASON = 'REH'
              Then 'OK'
              --
              When 
              Case When Max(Case When STAT = 'A' Then END_DATE End) 
                           Over(Partition By EMPLID) Between EFFDT And END_DATE 
                  Then 1 
             Else 0 
             End = 1 And ACTION = 'REH' And ACTION_REASON != 'ADL'
              THEN 'ERR - ACTION_REASON should be ADL - there are active EMPLIDs'
              --
              When STAT = 'A' And ACTION = 'REH' And ACTION_REASON = 'ADL'
              Then 'OK'
         Else '-'
         End "NOTICE"
FROM ( SELECT   g.*, 
               Case When g.EMPL_STATUS = 'T' Then 'TERM'
                    When EFFDT Between EFFDT_TERMINATED And END_DATE_TERMINATED And
                         END_DATE Between EFFDT_TERMINATED And END_DATE_TERMINATED
                    Then 'T'
               Else 'A'
               End  "STAT"
      FROM     grid g
   )
ORDER BY  RN
/*      R e s u l t : 
EMPLID  EMPL_RCD    EFFDT       END_DATE    EFFSEQ  ACTION  ACTION_REASON   EMPL_STATUS   NOTICE
------  ----------  ----------  ---------- -------  ------  -------------   ------------  ---------------------------------------------------------------
12345            0  14-JUN-21   01-MAR-22       0   HIR     INH             A             -
12345            0  01-MAR-22   01-SEP-22       0   PAY     SPI             A             -
12345            0  01-SEP-22   01-NOV-22       0   TER     CTG             T             -
12345            0  01-NOV-22   01-FEB-23       0   REH     REH             A             OK
12345            0  01-FEB-23   01-APR-23       0   PAY     CWP             A             -
12345            1  01-MAR-23   01-MAY-23       0   REH     ADL             A             OK
12345            0  01-APR-23   16-AUG-23       0   TER     CTG             T             -
12345            1  01-MAY-23   01-JUL-23       0   PAY     CWP             A             -
12345            1  01-JUL-23   01-OCT-23       0   TER     CTG             T             -
12345            0  16-AUG-23   01-DEC-23       0   REH     REH             A             OK
12345            1  01-OCT-23   16-OCT-23       0   REH     REH             A             ERR - ACTION_REASON should be ADL - there are active EMPLIDs
12345            1  16-OCT-23   16-DEC-23       0   PAY     CWP             A             -
12345            0  01-DEC-23   01-MAR-24       0   TER     CTG             T             -
12345            2  01-DEC-23   16-APR-24       0   REH     ADL             A             OK
12345            1  16-DEC-23   01-JAN-50       0   TER     CTG             T             -
12345            0  01-MAR-24   16-JUN-24       0   REH     REH             A**           ERR - ACTION_REASON should be ADL - there are active EMPLIDs
12345            2  16-APR-24   01-JAN-50       0   TER     CTG             T             -
12345            0  14-JUN-21   01-JAN-50       0   TER     CTG             T             -                                                                */

Upvotes: 0

Related Questions