Reputation: 1123
I have a master table A
, with ~9 million rows. Another table B
(same structure) has ~28K rows from table A
. What would be the best way to remove all contents of B
from table A
?
The combination of all columns (~10) are unique. Nothing more in the form a of a unique key.
Upvotes: 7
Views: 19619
Reputation: 42656
Is there a key value (or values) that can be used?
something like
DELETE a
FROM tableA a
INNER JOIN tableB b
on b.id = a.id
Upvotes: 0
Reputation: 14262
DELETE FROM TableA WHERE ID IN(SELECT ID FROM TableB)
Should work. Might take a while though.
Upvotes: 5
Reputation:
If you have sufficient rights you can create a new table and rename that one to A. To create the new table you can use the following script:
CREATE TABLE TEMP_A AS
SELECT *
FROM A
MINUS
SELECT *
FROM B
This should perform pretty good.
Upvotes: 6
Reputation: 13181
If this is something you'll have to do on a regular basis, the first choice should be to try to improve the database design (looking for primary keys, trying to get the "join" condition to be on as few columns as possible).
If that is not possible, the distinct second option is to figure out the "selectivity" of each of the columns (i.e. how many "different" values does each column have, 'name' would be more selective than 'address country' than 'male/female'). The general type of statement I'd suggest would be like this:
Delete from tableA
where exists (select * from tableB
where tableA.colx1 = tableB.colx1
and tableA.colx2 = tableB.colx2
etc. and tableA.colx10 = tableB.colx10).
The idea is to list the columns in order of the selectivity and build an index on colx1, colx2 etc. on tableB. The exact number of columns in tableB would be a result of some trial&measure. (Offset the time for building the index on tableB with the improved time of the delete statement.)
If this is just a one time operation, I'd just pick one of the slow methods outlined above. It's probably not worth the effort to think too much about this when you can just start a statement before going home ...
Upvotes: 1
Reputation: 96600
Delete t2
from t1
inner join t2
on t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3
and t1.col4 = t2.col4
and t1.col5 = t2.col5
and t1.col6 = t2.col6
and t1.col7 = t2.col7
and t1.col8 = t2.col8
and t1.col9 = t2.col9
and t1.col10 = t2.col0
This is likely to be very slow as you would have to have every col indexed which is highly unlikely in an environment when a table this size has no primary key, so do it during off peak. What possessed you to have a table with 9 million records and no primary key?
Upvotes: 1
Reputation: 135111
one way, just list out all the columns
delete table a
where exists (select 1 from table b where b.Col1= a.Col1
AND b.Col2= a.Col2
AND b.Col3= a.Col3
AND b.Col4= a.Col4)
Upvotes: 2