Bajji
Bajji

Reputation: 1123

How can I "subtract" one table from another?

I have a master table A, with ~9 million rows. Another table B (same structure) has ~28K rows from table A. What would be the best way to remove all contents of B from table A?

The combination of all columns (~10) are unique. Nothing more in the form a of a unique key.

Upvotes: 7

Views: 19619

Answers (6)

Joe
Joe

Reputation: 42656

Is there a key value (or values) that can be used?

something like

DELETE a
FROM tableA a
INNER JOIN tableB b
on b.id = a.id

Upvotes: 0

Iain Holder
Iain Holder

Reputation: 14262

DELETE FROM TableA WHERE ID IN(SELECT ID FROM TableB)

Should work. Might take a while though.

Upvotes: 5

Nidjarow
Nidjarow

Reputation:

If you have sufficient rights you can create a new table and rename that one to A. To create the new table you can use the following script:

CREATE TABLE TEMP_A AS
SELECT *
FROM   A
MINUS
SELECT *
FROM   B

This should perform pretty good.

Upvotes: 6

Thorsten
Thorsten

Reputation: 13181

If this is something you'll have to do on a regular basis, the first choice should be to try to improve the database design (looking for primary keys, trying to get the "join" condition to be on as few columns as possible).

If that is not possible, the distinct second option is to figure out the "selectivity" of each of the columns (i.e. how many "different" values does each column have, 'name' would be more selective than 'address country' than 'male/female'). The general type of statement I'd suggest would be like this:

Delete from tableA
where exists (select * from tableB
 where tableA.colx1 = tableB.colx1
 and tableA.colx2 = tableB.colx2
 etc. and tableA.colx10 = tableB.colx10).

The idea is to list the columns in order of the selectivity and build an index on colx1, colx2 etc. on tableB. The exact number of columns in tableB would be a result of some trial&measure. (Offset the time for building the index on tableB with the improved time of the delete statement.)

If this is just a one time operation, I'd just pick one of the slow methods outlined above. It's probably not worth the effort to think too much about this when you can just start a statement before going home ...

Upvotes: 1

HLGEM
HLGEM

Reputation: 96600

Delete t2 
from t1 
inner join t2 
  on t1.col1 = t2.col1
  and t1.col2 = t2.col2
  and t1.col3 = t2.col3
  and t1.col4 = t2.col4
  and t1.col5 = t2.col5
  and t1.col6 = t2.col6
  and t1.col7 = t2.col7
  and t1.col8 = t2.col8
  and t1.col9 = t2.col9
  and t1.col10 = t2.col0

This is likely to be very slow as you would have to have every col indexed which is highly unlikely in an environment when a table this size has no primary key, so do it during off peak. What possessed you to have a table with 9 million records and no primary key?

Upvotes: 1

SQLMenace
SQLMenace

Reputation: 135111

one way, just list out all the columns

delete table a
where exists (select 1 from table b where b.Col1= a.Col1 
AND b.Col2= a.Col2 
AND b.Col3= a.Col3
AND b.Col4= a.Col4)

Upvotes: 2

Related Questions