Reputation: 1
I have two tables as below:
Table A:
ID | Age | Dept | Salary | Start_date |
---|---|---|---|---|
1 | 30 | A | $1000 | 01-01-2000 |
1 | 31 | B | $1200 | 01-01-2022 |
2 | 25 | C | $1200 | 01-06-2021 |
2 | 26 | A | $1300 | 01-01-2022 |
3 | 34 | D | $1400 | 01-01-2021 |
3 | 35 | C | $1800 | 01-01-2022 |
Table B:
ID | Salary | Start_date |
---|---|---|
1 | $1500 | 01-06-2022 |
2 | $1800 | 01-01-2022 |
3 | $1600 | 01-06-2021 |
I want to insert synthetic new records in Table A, when there is a mismatch in ID and start_date combination between Table A and Table B. The new record should get the Age & Dept from table A for the record which has the closest start date from the missing record in Table B, and salary and start date from table B.
Output:
ID | Age | Dept | Salary | Start_date |
---|---|---|---|---|
1 | 30 | A | $1000 | 01-01-2000 |
1 | 31 | B | $1200 | 01-01-2022 |
1 | 31 | B | $1500 | 01-06-2022 |
2 | 25 | C | $1200 | 01-06-2021 |
2 | 26 | A | $1300 | 01-01-2022 |
3 | 34 | D | $1400 | 01-01-2021 |
3 | 34 | D | $1600 | 01-06-2021 |
3 | 35 | C | $1500 | 01-01-2022 |
Please help to write SQL queries which will achieve this.
I am facing problems in identifying the rows from Table A which has the closest start_date from the records in Table B.
Upvotes: 0
Views: 49
Reputation: 7801
If I got it right - just the rows without matching ID, START_DATE combination (in both tables) should be added. That means the row with ID = 2 in table B will not affect the matching row in table A. If that is the case than you need a combination of union all from the tables and Row_Number() and Last_Value() analytic functions along with some Case expressions.
WITH -- Sample Data
tbl_a (ID, AGE, DEPT, SALARY, START_DATE) AS
( Select 1, 30, 'A', 1000, DATE '2000-01-01' From Dual Union All
Select 1, 31, 'B', 1200, DATE '2022-01-01' From Dual Union All
Select 2, 25, 'C', 1200, DATE '2021-06-01' From Dual Union All
Select 2, 26, 'A', 1300, DATE '2022-01-01' From Dual Union All
Select 3, 34, 'D', 1400, DATE '2021-01-01' From Dual Union All
Select 3, 35, 'C', 1800, DATE '2022-01-01' From Dual
),
tbl_b (ID, SALARY, START_DATE) AS
( Select 1, 1500, DATE '2022-06-01' From Dual Union All
Select 2, 1800, DATE '2022-01-01' From Dual Union All
Select 3, 1600, DATE '2021-06-01' From Dual
)
--
-- M a i n S Q L :
SELECT ID,
Case When AGE Is Not Null Then AGE
Else LAST_VALUE(AGE) OVER(Partition By ID Order By ID, START_DATE, TBL Rows Between Unbounded Preceding And 1 Preceding)
End "AGE",
--
Case When DEPT Is Not Null Then DEPT
Else LAST_VALUE(DEPT) OVER(Partition By ID Order By ID, START_DATE, TBL Rows Between Unbounded Preceding And 1 Preceding)
End "DEPT",
--
SALARY "SALARY", START_DATE "START_DATE"
FROM ( Select *
From ( Select TBL, ID, AGE, DEPT, SALARY, START_DATE,
Row_Number() OVER(Partition By ID, START_DATE Order By ID, START_DATE, TBL) "RN"
From ( Select 'A' "TBL", ID, AGE, DEPT, SALARY, START_DATE From tbl_a Union All
Select 'B' "TBL", ID, Null, Null, SALARY, START_DATE From tbl_b
)
)
Where RN = 1 )
ORDER BY ID, START_DATE, DEPT
... the resulting dataset should be:
-- ID AGE DEPT SALARY START_DATE
-- ------ ---------- ---- ---------- ----------
-- 1 30 A 1000 01-JAN-00
-- 1 31 B 1200 01-JAN-22
-- 1 31 B 1500 01-JUN-22 <-- added row
-- 2 25 C 1200 01-JUN-21
-- 2 26 A 1300 01-JAN-22
-- 3 34 D 1400 01-JAN-21
-- 3 34 D 1600 01-JUN-21 <-- added row
-- 3 35 C 1800 01-JAN-22
... this could be merged into table A to insert the missing rows:
Merge Into tbl_a a
Using ( ... main sql ... ) x ON(x.ID = a.ID And
x.START_DATE = a.START_DATE)
When Not Matched
Then INSERT (a.ID, a.AGE, a.DEPT, a.SALARY, a.START_DATE)
Values (x.ID, x.AGE, x.DEPT, x.SALARY, x.START_DATE)
Note: ON clause should be adjusted (probably) to ensure unique row addressing in your table A.
Upvotes: 0
Reputation: 168096
Use UNION ALL
to combine the two tables and then the LAST_VALUE
analytic function to get the most recent age
and dept
values when they are not present:
SELECT id,
LAST_VALUE(age) IGNORE NULLS
OVER (PARTITION BY id ORDER BY start_date) AS age,
LAST_VALUE(dept) IGNORE NULLS
OVER (PARTITION BY id ORDER BY start_date) AS dept,
salary,
start_date
FROM (
SELECT id, age, dept, salary, start_date
FROM table_a
UNION ALL
SELECT id, NULL, NULL, salary, start_date
FROM table_b
)
Which, for the sample data:
CREATE TABLE table_a (ID, Age, Dept, Salary, Start_date) AS
SELECT 1, 30, 'A', 1000, DATE '2000-01-01' FROM DUAL UNION ALL
SELECT 1, 31, 'B', 1200, DATE '2022-01-01' FROM DUAL UNION ALL
SELECT 2, 25, 'C', 1200, DATE '2021-06-01' FROM DUAL UNION ALL
SELECT 2, 26, 'A', 1300, DATE '2022-01-01' FROM DUAL UNION ALL
SELECT 3, 34, 'D', 1400, DATE '2021-01-01' FROM DUAL UNION ALL
SELECT 3, 35, 'C', 1800, DATE '2022-01-01' FROM DUAL;
CREATE TABLE Table_B (ID, Salary, Start_date) AS
SELECT 1, 1500, DATE '2022-06-01' FROM DUAL UNION ALL
SELECT 2, 1800, DATE '2022-01-01' FROM DUAL UNION ALL
SELECT 3, 1600, DATE '2021-06-01' FROM DUAL;
Outputs:
ID | AGE | DEPT | SALARY | START_DATE |
---|---|---|---|---|
1 | 30 | A | 1000 | 2000-01-01 00:00:00 |
1 | 31 | B | 1200 | 2022-01-01 00:00:00 |
1 | 31 | B | 1500 | 2022-06-01 00:00:00 |
2 | 25 | C | 1200 | 2021-06-01 00:00:00 |
2 | 26 | A | 1300 | 2022-01-01 00:00:00 |
2 | 26 | A | 1800 | 2022-01-01 00:00:00 |
3 | 34 | D | 1400 | 2021-01-01 00:00:00 |
3 | 34 | D | 1600 | 2021-06-01 00:00:00 |
3 | 35 | C | 1800 | 2022-01-01 00:00:00 |
If you want to INSERT
the missing rows then, from Oracle 12, you can use a LATERAL
join and FETCH FIRST ROW ONLY
to find the most recent age
and dept
values for each row of table_b
:
INSERT INTO table_a (id, age, dept, salary, start_date)
SELECT b.id, a.age, a.dept, b.salary, b.start_date
FROM table_b b
LEFT OUTER JOIN LATERAL (
SELECT a.age, a.dept
FROM table_a a
WHERE a.id = b.id
AND a.start_date <= b.start_date
ORDER BY start_date DESC
FETCH FIRST ROW ONLY
) a
ON 1 = 1;
Upvotes: 1