Cool_Oracle
Cool_Oracle

Reputation: 313

Delete millions of records based on other tables

I have one master table in which the data is getting loaded from an external source. The table structure of the master table - PROD_MAIN is

         PROD_ROW_ID | PROD_VALUE | PROD_TYPE | PROD_DATE

The data gets loaded from master table into two other tables. The two tables are:

         PROD, PROD_ENT 

PROD and PROD_ENT tables are joined using the following condition.

        PROD.PROD_ROW_ID = PROD_ENT.PROD_PAR_ID

Everyday the data is inserted from PROD_MAIN into these two tables, PROD and PROD_ENT tables. Due to some DB issues, the many records got missed loading into PROD and PROD_ENT tables.

So, I need to check the missing records for 3 months, i.e from 19th Nov to 19th Feb; which did not load into PROD and PROD_ENT tables. All these tables have around 200 million of records.

So, I wrote the below query to get the results. However it is giving me zero records. Could you please help on this?

SELECT /*+ PARALLEL (PROD_MAIN,15) */ MH.*
FROM   PROD PMN, 
       PROD_ENT PCH, 
       PROD_MAIN MH
WHERE PMN.PROD_ROW_ID = PCH.PROD_PAR_ID
AND   MH.PROD_ROW_ID(+) = PMN.PROD_ROW_ID
AND   MH.PROD_VALUE(+) = PCH.PROD_VALUE 
AND   MH.PROD_TYPE(+) = PMN.PROD_TYPE
AND   MH.PROD_DATE (+) = PCH.PROD_DATE 
AND   MH.PROD_ROW_ID IS NULL
AND   MH.PROD_VALUE IS NULL
AND   MH.PROD_TYPE IS NULL
AND   MH.PROD_DATE  IS NULL
AND   MH.PROD_DATE  BETWEEN '19-NOV-2015' AND '19-FEB-2016'

***** Edited code *****

In case, 1. If I need to get the records that are PRESENT in PROD_MAIN tables and NOT PRESENT in PROD. 2. And if If I need to get the records that are PRESENT in PROD_MAIN tables and NOT PRESENT in PROD_ENT tables separately, do I need to write a union query like below or is there any other easy way to do so ?

    SELECT MH.*
FROM PROD_MAIN AS MH
LEFT JOIN PROD AS PMN
    ON (PMN.PROD_ROW_ID = MH.PROD_ROW_ID
    AND PMN.PROD_TYPE = MH.PROD_TYPE)
LEFT JOIN PROD_ENT AS PCH
    ON (PCH.PROD_PAR_ID = PMN.PROD_ROW_ID
    AND PCH.PROD_DATE = MH.PROD_DATE
    AND PCH.PROD_VALUE = MH.PROD_VALUE)
WHERE PMN.PROD_ROW_ID IS NULL
AND MH.PROD_DATE  BETWEEN '19-NOV-2015' AND '19-FEB-2016'

UNION

    SELECT MH.*
FROM PROD_MAIN AS MH
LEFT JOIN PROD AS PMN
    ON (PMN.PROD_ROW_ID = MH.PROD_ROW_ID
    AND PMN.PROD_TYPE = MH.PROD_TYPE)
LEFT JOIN PROD_ENT AS PCH
    ON (PCH.PROD_PAR_ID = PMN.PROD_ROW_ID
    AND PCH.PROD_DATE = MH.PROD_DATE
    AND PCH.PROD_VALUE = MH.PROD_VALUE)
WHERE PCH.PROD_PAR_ID IS NULL
AND MH.PROD_DATE  BETWEEN '19-NOV-2015' AND '19-FEB-2016'

Upvotes: 0

Views: 86

Answers (3)

APC
APC

Reputation: 146239

It looks like your logic is flawed. You say the data is loaded from PROD_MAIN into the other tables. Therefore you need to be looking for rows which are in that table but not in PROD and PROD_ENT.

However, your anti-join is filtering on PROD_MAIN columns. These two filters can never both be true:

MH.PROD_DATE  IS NULL
AND   MH.PROD_DATE  BETWEEN '19-NOV-2015' AND '19-FEB-2016'

Likewise, if you are looking for non-existent rows in PROD and PROD_ENT then this join condition can never be true:

PMN.PROD_ROW_ID = PCH.PROD_PAR_ID

Almost certainly you need to be checking that join columns in PROD and PROD_ENT are null. I'm using the ANSI 92 syntax, because it makes outer joins much easier to understand.

SELECT /*+ PARALLEL (PROD_MAIN,15) */ MH.*
FROM   PROD_MAIN MH
    left outer join PROD PMN
        on MH.PROD_ROW_ID = PMN.PROD_ROW_ID   
        and MH.PROD_TYPE = PMN.PROD_TYPE
    left outer join PROD_ENT PCH
        on MH.PROD_ROW_ID = PCH.PROD_PAR_ID 
        and MH.PROD_VALUE = PCH.PROD_VALUE 
        and    MH.PROD_DATE = PCH.PROD_DATE 
where MH.PROD_DATE  BETWEEN date '2015-11-19' AND date '2016-02-19'
AND   PCH.PROD_PAR_ID IS NULL
AND   PMN.PROD_ROW_ID  IS NULL

Not sure whether all those columns are necessary in the join clauses: I have simply copied your join logic.


" I have a small scenario again....can I write a union query like the one updated in main question?"

The way you've written your query will produce results but you won't be able to distinguish the three categories (in PROD but not in PROD_ENT, in PROD_ENT but not in PROD, in neither). In that would be a useful piece of information you need to amend the query slightly:

SELECT 'PROD' as tgt_table, MH.*
FROM PROD_MAIN AS MH
LEFT JOIN PROD AS PMN
    ON (PMN.PROD_ROW_ID = MH.PROD_ROW_ID
    AND PMN.PROD_TYPE = MH.PROD_TYPE)
WHERE PMN.PROD_ROW_ID IS NULL
AND MH.PROD_DATE  BETWEEN '19-NOV-2015' AND '19-FEB-2016'

UNION ALL

SELECT 'PROD_ENT' as tgt_table, MH.*
FROM PROD_MAIN AS MH
LEFT JOIN PROD AS PMN
    ON (PMN.PROD_ROW_ID = MH.PROD_ROW_ID
    AND PMN.PROD_TYPE = MH.PROD_TYPE)
LEFT JOIN PROD_ENT AS PCH
    ON (PCH.PROD_PAR_ID = PMN.PROD_ROW_ID
    AND PCH.PROD_DATE = MH.PROD_DATE
    AND PCH.PROD_VALUE = MH.PROD_VALUE)
WHERE PCH.PROD_PAR_ID IS NULL
AND MH.PROD_DATE  BETWEEN '19-NOV-2015' AND '19-FEB-2016'

Use UNION ALL rather than UNION to avoid the unnecessary sort.

Your version of the Outer Join on PROD_ENT is different from mine. Your version joins on PCH.PROD_PAR_ID = PMN.PROD_ROW_ID so it will give false positives for PROD_ENT records which are actually present but lack an owner record in PROD. If that can never happen then it doesn't matter, but as you appear to be investigating a muck up in your loading process you probably should be as precise as possible.

Upvotes: 1

Shiblu
Shiblu

Reputation: 477

You have to use left join-

SELECT MH.*
FROM PROD_MAIN AS MH
LEFT JOIN PROD AS PMN
    ON (PMN.PROD_ROW_ID = MH.PROD_ROW_ID
    AND PMN.PROD_TYPE = MH.PROD_TYPE)
LEFT JOIN PROD_ENT AS PCH
    ON (PCH.PROD_PAR_ID = PMN.PROD_ROW_ID
    AND PCH.PROD_DATE = MH.PROD_DATE
    AND PCH.PROD_VALUE = MH.PROD_VALUE)
WHERE PMN.PROD_ROW_ID IS NULL OR PCH.PROD_PAR_ID IS NULL
AND MH.PROD_DATE  BETWEEN '19-NOV-2015' AND '19-FEB-2016'

Notice the following line is the key to separate the entries that are in PROD_MAIN but not in PROD or PROD_ENT

WHERE PMN.PROD_ROW_ID IS NULL OR PCH.PROD_PAR_ID IS NULL 

By using left join you are first considering all the rows in the left table i.e. PROD_MAIN and then you are also taking rows those have a match with the right table which is PROD by the comparing PROD_ROW_ID (ignore PROD_TYPE for simplicity).

LEFT JOIN PROD AS PMN ON (PMN.PROD_ROW_ID = MH.PROD_ROW_ID)

If there is a PROD_ROW_ID ( Again Ignoring PROD_TYPE) that is in PROD_MAIN but not in PROD for that row all the columns of PROD will contain null. So in the WHERE clause you can just check any column of the right table to be null

WHERE PMN.PROD_ROW_ID IS NULL

Upvotes: 0

Thomas Voß
Thomas Voß

Reputation: 1165

I need to get those records that are PRESENT in PROD_MAIN tables and NOT PRESENT in PROD AND PROD_ENT tables

Try:

SELECT PROD_ROW_ID -- DELETE -- To realy delete remove 'SELECT PROD_ROW_ID -- '
FROM PROD_MAIN 
WHERE PROD_DATE BETWEEN '19-NOV-2015' AND '19-FEB-2016' AND 
     (PROD_ROW_ID NOT IN (SELECT PROD_PAR_ID FROM PROD_ENT) 
      AND -- or OR if the record should be deleted if not present in one of the two tables
      PROD_ROW_ID NOT IN (SELECT PROD_ROW_ID FROM PROD))

Be aware, that I guessed that the ..._ID columns are your Primary Key in all three tables

Upvotes: 0

Related Questions