Reputation: 316
I am trying to delete duplicate data from a database where the data is across multiple tables.
I have 3 tables, Person, Person_Address_Map, and Address. I want to delete all the duplicate rows except one in my person table but I want to also check that the Address data is the same (but not delete any Rows in the Address table).
I have the following which deletes all but one duplicate entry in my Person table.
DELETE
FROM Person p
WHERE EXISTS
(SELECT 1
FROM Person temp
Where p.name = temp.name AND
p.id < temp.id);
Person Person_Address_Map Address
| ID | Name | Age | | Person_ID | Address_ID | | ID | City | State |
------------------- -------------------------- ---------------------
| 1 | Bob | 20 | | 1 | 1 | | 1 | NYC | NY |
| 2 | Bob | 20 | | 2 | 2 | | 2 | NYC | NY |
| 3 | Jon | 50 | | 3 | 3 | | 3 | LA | CA |
| 4 | Jon | 50 | | 4 | 4 | | 4 | SF | CA |
Now for example my Address table has a city. The Person_Address_Map contains a Person ID and Address ID. How can I change this query so it doesn't just check that the 'name' of the Person is equal but also the 'city' in which they are mapped to in the Address table is equal? (without deleting any data from the Address table)
After deleting and leaving one duplicate I would expect the following left in my person table. With the Person_Address_Map also being updated.
Person Person_Address_Map Address
| ID | Name | Age | | Person_ID | Address_ID | | ID | City | State |
------------------- -------------------------- ---------------------
| 1 | Bob | 20 | | 1 | 1 | | 1 | NYC | NY |
| 2 | NYC | NY |
| 3 | Jon | 50 | | 3 | 3 | | 3 | LA | CA |
| 4 | Jon | 50 | | 4 | 4 | | 4 | SF | CA |
Upvotes: 0
Views: 1730
Reputation: 879
I guess this should work,
DELETE
FROM PERSON P
WHERE EXISTS (SELECT 1
FROM (WITH TABLE_ AS (SELECT P.ID,
P.NAME,
P.AGE,
PERSON_ID,
ADDRESS_ID,
A.ID ADD_ID,
A.CITY,
A.STATE
FROM PERSON P,
PERSON_ADDRESS_MAP PA,
ADDRESS_ A
WHERE P.ID = PA.PERSON_ID
AND PA.ADDRESS_ID = A.ID)
SELECT A1.ID, TABLE_.NAME
FROM ADDRESS_ A1, TABLE_
WHERE TABLE_.ADD_ID != A1.ID
AND TABLE_.CITY = A1.CITY
AND TABLE_.STATE = A1.STATE) DELETABLE_
WHERE P.ID > DELETABLE_.ID
AND P.NAME = DELETABLE_.NAME
);
This is the table definitions I used,
CREATE TABLE person (
ID NUMBER(5),
NAME VARCHAR2(15) ,
AGE NUMBER(3));
CREATE TABLE Person_Address_Map (
Person_ID NUMBER(5),
Address_ID NUMBER(5)
);
CREATE TABLE Address_ (
ID NUMBER(5),
City VARCHAR2(15) ,
State VARCHAR2(15));
INSERT INTO person VALUES (1,'Bob',20);
INSERT INTO person VALUES (2,'Bob',20);
INSERT INTO person VALUES (3,'Jon',50);
INSERT INTO person VALUES (4,'Jon',50);
INSERT INTO Person_Address_Map VALUES (1,1);
INSERT INTO Person_Address_Map VALUES (2,2);
INSERT INTO Person_Address_Map VALUES (3,3);
INSERT INTO Person_Address_Map VALUES (4,4);
INSERT INTO Address_ VALUES (1,'NYC','NY');
INSERT INTO Address_ VALUES (2,'NYC','NY');
INSERT INTO Address_ VALUES (3,'LA','CA');
INSERT INTO Address_ VALUES (4,'CA','CA');
I did not check the cascade constraints.. I guess it should not give you any problems.
Upvotes: 1