Reputation: 313
I have one master table in which the data is getting loaded from an external source. The table structure of the master table - PROD_MAIN is
PROD_ROW_ID | PROD_VALUE | PROD_TYPE | PROD_DATE
The data gets loaded from master table into two other tables. The two tables are:
PROD, PROD_ENT
PROD and PROD_ENT tables are joined using the following condition.
PROD.PROD_ROW_ID = PROD_ENT.PROD_PAR_ID
Everyday the data is inserted from PROD_MAIN into these two tables, PROD and PROD_ENT tables. Due to some DB issues, the many records got missed loading into PROD and PROD_ENT tables.
So, I need to check the missing records for 3 months, i.e from 19th Nov to 19th Feb; which did not load into PROD and PROD_ENT tables. All these tables have around 200 million of records.
So, I wrote the below query to get the results. However it is giving me zero records. Could you please help on this?
SELECT /*+ PARALLEL (PROD_MAIN,15) */ MH.*
FROM PROD PMN,
PROD_ENT PCH,
PROD_MAIN MH
WHERE PMN.PROD_ROW_ID = PCH.PROD_PAR_ID
AND MH.PROD_ROW_ID(+) = PMN.PROD_ROW_ID
AND MH.PROD_VALUE(+) = PCH.PROD_VALUE
AND MH.PROD_TYPE(+) = PMN.PROD_TYPE
AND MH.PROD_DATE (+) = PCH.PROD_DATE
AND MH.PROD_ROW_ID IS NULL
AND MH.PROD_VALUE IS NULL
AND MH.PROD_TYPE IS NULL
AND MH.PROD_DATE IS NULL
AND MH.PROD_DATE BETWEEN '19-NOV-2015' AND '19-FEB-2016'
***** Edited code *****
In case, 1. If I need to get the records that are PRESENT in PROD_MAIN tables and NOT PRESENT in PROD. 2. And if If I need to get the records that are PRESENT in PROD_MAIN tables and NOT PRESENT in PROD_ENT tables separately, do I need to write a union query like below or is there any other easy way to do so ?
SELECT MH.*
FROM PROD_MAIN AS MH
LEFT JOIN PROD AS PMN
ON (PMN.PROD_ROW_ID = MH.PROD_ROW_ID
AND PMN.PROD_TYPE = MH.PROD_TYPE)
LEFT JOIN PROD_ENT AS PCH
ON (PCH.PROD_PAR_ID = PMN.PROD_ROW_ID
AND PCH.PROD_DATE = MH.PROD_DATE
AND PCH.PROD_VALUE = MH.PROD_VALUE)
WHERE PMN.PROD_ROW_ID IS NULL
AND MH.PROD_DATE BETWEEN '19-NOV-2015' AND '19-FEB-2016'
UNION
SELECT MH.*
FROM PROD_MAIN AS MH
LEFT JOIN PROD AS PMN
ON (PMN.PROD_ROW_ID = MH.PROD_ROW_ID
AND PMN.PROD_TYPE = MH.PROD_TYPE)
LEFT JOIN PROD_ENT AS PCH
ON (PCH.PROD_PAR_ID = PMN.PROD_ROW_ID
AND PCH.PROD_DATE = MH.PROD_DATE
AND PCH.PROD_VALUE = MH.PROD_VALUE)
WHERE PCH.PROD_PAR_ID IS NULL
AND MH.PROD_DATE BETWEEN '19-NOV-2015' AND '19-FEB-2016'
Upvotes: 0
Views: 86
Reputation: 146239
It looks like your logic is flawed. You say the data is loaded from PROD_MAIN into the other tables. Therefore you need to be looking for rows which are in that table but not in PROD and PROD_ENT.
However, your anti-join is filtering on PROD_MAIN columns. These two filters can never both be true:
MH.PROD_DATE IS NULL
AND MH.PROD_DATE BETWEEN '19-NOV-2015' AND '19-FEB-2016'
Likewise, if you are looking for non-existent rows in PROD and PROD_ENT then this join condition can never be true:
PMN.PROD_ROW_ID = PCH.PROD_PAR_ID
Almost certainly you need to be checking that join columns in PROD and PROD_ENT are null. I'm using the ANSI 92 syntax, because it makes outer joins much easier to understand.
SELECT /*+ PARALLEL (PROD_MAIN,15) */ MH.*
FROM PROD_MAIN MH
left outer join PROD PMN
on MH.PROD_ROW_ID = PMN.PROD_ROW_ID
and MH.PROD_TYPE = PMN.PROD_TYPE
left outer join PROD_ENT PCH
on MH.PROD_ROW_ID = PCH.PROD_PAR_ID
and MH.PROD_VALUE = PCH.PROD_VALUE
and MH.PROD_DATE = PCH.PROD_DATE
where MH.PROD_DATE BETWEEN date '2015-11-19' AND date '2016-02-19'
AND PCH.PROD_PAR_ID IS NULL
AND PMN.PROD_ROW_ID IS NULL
Not sure whether all those columns are necessary in the join clauses: I have simply copied your join logic.
" I have a small scenario again....can I write a union query like the one updated in main question?"
The way you've written your query will produce results but you won't be able to distinguish the three categories (in PROD but not in PROD_ENT, in PROD_ENT but not in PROD, in neither). In that would be a useful piece of information you need to amend the query slightly:
SELECT 'PROD' as tgt_table, MH.*
FROM PROD_MAIN AS MH
LEFT JOIN PROD AS PMN
ON (PMN.PROD_ROW_ID = MH.PROD_ROW_ID
AND PMN.PROD_TYPE = MH.PROD_TYPE)
WHERE PMN.PROD_ROW_ID IS NULL
AND MH.PROD_DATE BETWEEN '19-NOV-2015' AND '19-FEB-2016'
UNION ALL
SELECT 'PROD_ENT' as tgt_table, MH.*
FROM PROD_MAIN AS MH
LEFT JOIN PROD AS PMN
ON (PMN.PROD_ROW_ID = MH.PROD_ROW_ID
AND PMN.PROD_TYPE = MH.PROD_TYPE)
LEFT JOIN PROD_ENT AS PCH
ON (PCH.PROD_PAR_ID = PMN.PROD_ROW_ID
AND PCH.PROD_DATE = MH.PROD_DATE
AND PCH.PROD_VALUE = MH.PROD_VALUE)
WHERE PCH.PROD_PAR_ID IS NULL
AND MH.PROD_DATE BETWEEN '19-NOV-2015' AND '19-FEB-2016'
Use UNION ALL rather than UNION to avoid the unnecessary sort.
Your version of the Outer Join on PROD_ENT is different from mine. Your version joins on PCH.PROD_PAR_ID = PMN.PROD_ROW_ID so it will give false positives for PROD_ENT records which are actually present but lack an owner record in PROD. If that can never happen then it doesn't matter, but as you appear to be investigating a muck up in your loading process you probably should be as precise as possible.
Upvotes: 1
Reputation: 477
You have to use left join-
SELECT MH.*
FROM PROD_MAIN AS MH
LEFT JOIN PROD AS PMN
ON (PMN.PROD_ROW_ID = MH.PROD_ROW_ID
AND PMN.PROD_TYPE = MH.PROD_TYPE)
LEFT JOIN PROD_ENT AS PCH
ON (PCH.PROD_PAR_ID = PMN.PROD_ROW_ID
AND PCH.PROD_DATE = MH.PROD_DATE
AND PCH.PROD_VALUE = MH.PROD_VALUE)
WHERE PMN.PROD_ROW_ID IS NULL OR PCH.PROD_PAR_ID IS NULL
AND MH.PROD_DATE BETWEEN '19-NOV-2015' AND '19-FEB-2016'
Notice the following line is the key to separate the entries that are in PROD_MAIN but not in PROD or PROD_ENT
WHERE PMN.PROD_ROW_ID IS NULL OR PCH.PROD_PAR_ID IS NULL
By using left join you are first considering all the rows in the left table i.e. PROD_MAIN and then you are also taking rows those have a match with the right table which is PROD by the comparing PROD_ROW_ID (ignore PROD_TYPE
for simplicity).
LEFT JOIN PROD AS PMN ON (PMN.PROD_ROW_ID = MH.PROD_ROW_ID)
If there is a PROD_ROW_ID ( Again Ignoring PROD_TYPE) that is in PROD_MAIN but not in PROD for that row all the columns of PROD will contain null. So in the WHERE
clause you can just check any column of the right table to be null
WHERE PMN.PROD_ROW_ID IS NULL
Upvotes: 0
Reputation: 1165
I need to get those records that are PRESENT in PROD_MAIN tables and NOT PRESENT in PROD AND PROD_ENT tables
Try:
SELECT PROD_ROW_ID -- DELETE -- To realy delete remove 'SELECT PROD_ROW_ID -- '
FROM PROD_MAIN
WHERE PROD_DATE BETWEEN '19-NOV-2015' AND '19-FEB-2016' AND
(PROD_ROW_ID NOT IN (SELECT PROD_PAR_ID FROM PROD_ENT)
AND -- or OR if the record should be deleted if not present in one of the two tables
PROD_ROW_ID NOT IN (SELECT PROD_ROW_ID FROM PROD))
Be aware, that I guessed that the ..._ID columns are your Primary Key in all three tables
Upvotes: 0