Sudd_SQL
Sudd_SQL

Reputation: 1

To find records from Table A having effective dates closest to records in table B

I have two tables as below:

Table A:

ID Age Dept Salary Start_date
1 30 A $1000 01-01-2000
1 31 B $1200 01-01-2022
2 25 C $1200 01-06-2021
2 26 A $1300 01-01-2022
3 34 D $1400 01-01-2021
3 35 C $1800 01-01-2022

Table B:

ID Salary Start_date
1 $1500 01-06-2022
2 $1800 01-01-2022
3 $1600 01-06-2021

I want to insert synthetic new records in Table A, when there is a mismatch in ID and start_date combination between Table A and Table B. The new record should get the Age & Dept from table A for the record which has the closest start date from the missing record in Table B, and salary and start date from table B.

Output:

ID Age Dept Salary Start_date
1 30 A $1000 01-01-2000
1 31 B $1200 01-01-2022
1 31 B $1500 01-06-2022
2 25 C $1200 01-06-2021
2 26 A $1300 01-01-2022
3 34 D $1400 01-01-2021
3 34 D $1600 01-06-2021
3 35 C $1500 01-01-2022

Please help to write SQL queries which will achieve this.

I am facing problems in identifying the rows from Table A which has the closest start_date from the records in Table B.

Upvotes: 0

Views: 49

Answers (2)

d r
d r

Reputation: 7801

If I got it right - just the rows without matching ID, START_DATE combination (in both tables) should be added. That means the row with ID = 2 in table B will not affect the matching row in table A. If that is the case than you need a combination of union all from the tables and Row_Number() and Last_Value() analytic functions along with some Case expressions.

WITH        --  Sample Data
    tbl_a (ID, AGE, DEPT, SALARY, START_DATE) AS
        (   Select 1, 30, 'A', 1000, DATE '2000-01-01' From Dual Union All
            Select 1, 31, 'B', 1200, DATE '2022-01-01' From Dual Union All
            Select 2, 25, 'C', 1200, DATE '2021-06-01' From Dual Union All
            Select 2, 26, 'A', 1300, DATE '2022-01-01' From Dual Union All
            Select 3, 34, 'D', 1400, DATE '2021-01-01' From Dual Union All
            Select 3, 35, 'C', 1800, DATE '2022-01-01' From Dual
        ),
    tbl_b (ID, SALARY, START_DATE) AS
        (   Select 1, 1500, DATE '2022-06-01' From Dual Union All
            Select 2, 1800, DATE '2022-01-01' From Dual Union All
            Select 3, 1600, DATE '2021-06-01' From Dual
        )
--  
--  M a i n    S Q L :
SELECT    ID, 
          Case  When AGE Is Not Null Then AGE
          Else LAST_VALUE(AGE) OVER(Partition By ID Order By ID, START_DATE, TBL Rows Between Unbounded Preceding And 1 Preceding) 
          End "AGE",
          --
          Case  When DEPT Is Not Null Then DEPT 
          Else LAST_VALUE(DEPT) OVER(Partition By ID Order By ID, START_DATE, TBL Rows Between Unbounded Preceding And 1 Preceding)
          End "DEPT",
          --
          SALARY "SALARY", START_DATE "START_DATE"
FROM      ( Select    * 
            From      ( Select    TBL, ID, AGE, DEPT, SALARY, START_DATE, 
                                  Row_Number() OVER(Partition By ID, START_DATE Order By ID, START_DATE, TBL) "RN"
                        From    ( Select        'A' "TBL", ID, AGE, DEPT, SALARY, START_DATE From tbl_a Union All
                                  Select        'B' "TBL", ID, Null, Null, SALARY, START_DATE From tbl_b
                                )
                      ) 
            Where     RN = 1 )
ORDER BY  ID, START_DATE, DEPT

... the resulting dataset should be:

--      ID        AGE DEPT     SALARY START_DATE
--  ------ ---------- ---- ---------- ----------
--       1         30 A          1000 01-JAN-00
--       1         31 B          1200 01-JAN-22
--       1         31 B          1500 01-JUN-22   <-- added row
--       2         25 C          1200 01-JUN-21
--       2         26 A          1300 01-JAN-22
--       3         34 D          1400 01-JAN-21
--       3         34 D          1600 01-JUN-21   <-- added row
--       3         35 C          1800 01-JAN-22

... this could be merged into table A to insert the missing rows:

Merge Into tbl_a a
Using ( ... main sql ... ) x ON(x.ID = a.ID And 
                                x.START_DATE = a.START_DATE) 
When Not Matched
Then  INSERT (a.ID, a.AGE, a.DEPT, a.SALARY, a.START_DATE) 
      Values (x.ID, x.AGE, x.DEPT, x.SALARY, x.START_DATE)

Note: ON clause should be adjusted (probably) to ensure unique row addressing in your table A.

Upvotes: 0

MT0
MT0

Reputation: 168096

Use UNION ALL to combine the two tables and then the LAST_VALUE analytic function to get the most recent age and dept values when they are not present:

SELECT id,
       LAST_VALUE(age) IGNORE NULLS
         OVER (PARTITION BY id ORDER BY start_date) AS age,
       LAST_VALUE(dept) IGNORE NULLS
         OVER (PARTITION BY id ORDER BY start_date) AS dept,
       salary,
       start_date
FROM   (
  SELECT id, age, dept, salary, start_date
  FROM   table_a
UNION ALL
  SELECT id, NULL, NULL, salary, start_date
  FROM   table_b
)

Which, for the sample data:

CREATE TABLE table_a (ID, Age, Dept, Salary, Start_date) AS
SELECT 1, 30, 'A', 1000, DATE '2000-01-01' FROM DUAL UNION ALL
SELECT 1, 31, 'B', 1200, DATE '2022-01-01' FROM DUAL UNION ALL
SELECT 2, 25, 'C', 1200, DATE '2021-06-01' FROM DUAL UNION ALL
SELECT 2, 26, 'A', 1300, DATE '2022-01-01' FROM DUAL UNION ALL
SELECT 3, 34, 'D', 1400, DATE '2021-01-01' FROM DUAL UNION ALL
SELECT 3, 35, 'C', 1800, DATE '2022-01-01' FROM DUAL;

CREATE TABLE Table_B (ID, Salary, Start_date) AS
SELECT 1, 1500, DATE '2022-06-01' FROM DUAL UNION ALL
SELECT 2, 1800, DATE '2022-01-01' FROM DUAL UNION ALL
SELECT 3, 1600, DATE '2021-06-01' FROM DUAL;

Outputs:

ID AGE DEPT SALARY START_DATE
1 30 A 1000 2000-01-01 00:00:00
1 31 B 1200 2022-01-01 00:00:00
1 31 B 1500 2022-06-01 00:00:00
2 25 C 1200 2021-06-01 00:00:00
2 26 A 1300 2022-01-01 00:00:00
2 26 A 1800 2022-01-01 00:00:00
3 34 D 1400 2021-01-01 00:00:00
3 34 D 1600 2021-06-01 00:00:00
3 35 C 1800 2022-01-01 00:00:00

If you want to INSERT the missing rows then, from Oracle 12, you can use a LATERAL join and FETCH FIRST ROW ONLY to find the most recent age and dept values for each row of table_b:

INSERT INTO table_a (id, age, dept, salary, start_date)
SELECT b.id, a.age, a.dept, b.salary, b.start_date
FROM   table_b b
       LEFT OUTER JOIN LATERAL (
         SELECT a.age, a.dept
         FROM   table_a a
         WHERE  a.id = b.id
         AND    a.start_date <= b.start_date
         ORDER BY start_date DESC
         FETCH FIRST ROW ONLY
       ) a
       ON 1 = 1;

fiddle

Upvotes: 1

Related Questions