sideshowbob
sideshowbob

Reputation: 316

Delete duplicates sql across multiple tables

I am trying to delete duplicate data from a database where the data is across multiple tables.

I have 3 tables, Person, Person_Address_Map, and Address. I want to delete all the duplicate rows except one in my person table but I want to also check that the Address data is the same (but not delete any Rows in the Address table).

I have the following which deletes all but one duplicate entry in my Person table.

  DELETE
  FROM Person p
  WHERE EXISTS
    (SELECT 1 
    FROM Person temp
    Where p.name = temp.name AND
    p.id < temp.id);


 Person                    Person_Address_Map             Address
 | ID | Name | Age |       | Person_ID | Address_ID |     | ID | City | State |
 -------------------       --------------------------     ---------------------
 | 1  | Bob  |  20 |       |     1     |     1      |     | 1  | NYC  |  NY   | 
 | 2  | Bob  |  20 |       |     2     |     2      |     | 2  | NYC  |  NY   |
 | 3  | Jon  |  50 |       |     3     |     3      |     | 3  | LA   |  CA   | 
 | 4  | Jon  |  50 |       |     4     |     4      |     | 4  | SF   |  CA   |

Now for example my Address table has a city. The Person_Address_Map contains a Person ID and Address ID. How can I change this query so it doesn't just check that the 'name' of the Person is equal but also the 'city' in which they are mapped to in the Address table is equal? (without deleting any data from the Address table)

After deleting and leaving one duplicate I would expect the following left in my person table. With the Person_Address_Map also being updated.

 Person                    Person_Address_Map             Address
 | ID | Name | Age |       | Person_ID | Address_ID |     | ID | City | State |
 -------------------       --------------------------     ---------------------
 | 1  | Bob  |  20 |       |     1     |     1      |     | 1  | NYC  |  NY   | 
                                                          | 2  | NYC  |  NY   |
 | 3  | Jon  |  50 |       |     3     |     3      |     | 3  | LA   |  CA   | 
 | 4  | Jon  |  50 |       |     4     |     4      |     | 4  | SF   |  CA   |

Upvotes: 0

Views: 1730

Answers (1)

Shankar
Shankar

Reputation: 879

I guess this should work,

DELETE
  FROM PERSON P
 WHERE EXISTS (SELECT 1
          FROM (WITH TABLE_ AS (SELECT P.ID,
                                       P.NAME,
                                       P.AGE,
                                       PERSON_ID,
                                       ADDRESS_ID,
                                       A.ID ADD_ID,
                                       A.CITY,
                                       A.STATE
                                  FROM PERSON             P,
                                       PERSON_ADDRESS_MAP PA,
                                       ADDRESS_           A
                                 WHERE P.ID = PA.PERSON_ID
                                   AND PA.ADDRESS_ID = A.ID)
                 SELECT A1.ID, TABLE_.NAME
                   FROM ADDRESS_ A1, TABLE_
                  WHERE TABLE_.ADD_ID != A1.ID
                    AND TABLE_.CITY = A1.CITY
                    AND TABLE_.STATE = A1.STATE) DELETABLE_
                  WHERE P.ID > DELETABLE_.ID
                    AND P.NAME = DELETABLE_.NAME
        );

This is the table definitions I used,

CREATE TABLE person (
         ID         NUMBER(5),
         NAME       VARCHAR2(15) ,
         AGE        NUMBER(3));
CREATE TABLE Person_Address_Map (
         Person_ID          NUMBER(5),
         Address_ID          NUMBER(5)
         );         
CREATE TABLE Address_ (
         ID         NUMBER(5),
         City        VARCHAR2(15) ,
         State        VARCHAR2(15));         

INSERT INTO person VALUES (1,'Bob',20);         
INSERT INTO person VALUES (2,'Bob',20);
INSERT INTO person VALUES (3,'Jon',50);
INSERT INTO person VALUES (4,'Jon',50);

INSERT INTO Person_Address_Map VALUES (1,1);
INSERT INTO Person_Address_Map VALUES (2,2);
INSERT INTO Person_Address_Map VALUES (3,3);
INSERT INTO Person_Address_Map VALUES (4,4);

INSERT INTO Address_ VALUES (1,'NYC','NY');
INSERT INTO Address_ VALUES (2,'NYC','NY');
INSERT INTO Address_ VALUES (3,'LA','CA');
INSERT INTO Address_ VALUES (4,'CA','CA');

I did not check the cascade constraints.. I guess it should not give you any problems.

Upvotes: 1

Related Questions