Reputation: 40471
I'm having a little trouble designing a code that will run fast.
My requirments are:
I have a table SCD(180mil records) and a smaller table LOG(about 300 records).
LOG structure:
REAL_KEY | FIC_KEY
SCD structure :
Another_KEY | SERIAL_KEY ....
I need to delete from SCD all the real key's, where also exists a record with FIC_KEY, so something like
delete from scd t
where serial_number in(select real_key from log l1)
and exists(select 1 from scd s,log l2 where s.serial_key = l2.fic_key
and l2.real_key = l1.real_key)
The problem is I cant use the first correlated query results to compare the second (l2.real_key = l1.real_key). Also, even if it will run, this probably will take a lot of time since SCD contains a lot of records.
Any help would be appriciated.
Upvotes: 1
Views: 585
Reputation: 40471
Well I've managed to come up with the answer. I'll explain the logic:
We've had an error over the last few nights, when our sources gave us data with the new REAL serial and because of a problem we had we disabled our process that fix those serials in our data warehouse.
This caused some problems, when each real serial came as a NEW row, and inserted into our SCD tables as new records instead of updating the existing serials, so we had to reconstruct our tables to how they were few nights ago.
The best and fastest delete query is:
delete from scd t
where t.serial_number in(select s.real_serial_number
from UPD_SERIAL s,scd t2
where t2.serial_number = s.fic_serial_number)
Upvotes: 0
Reputation: 21043
First decide which keys should be deleted.
If you want to delete onlyreal_key
s for which a fic_key
exist in SCD, this is as follows:
select real_key
from log join scd on log.fic_key = scd.serial_key
The delete is than
delete from scd where serial_key in (
select real_key
from log join scd on log.fic_key = scd.serial_key);
If you have an index on serial_key in SCD the delete will be done with two NL joins which should be quite instant. If no the performance is estimated with two hash joins of the SCD table with a small table. This should not be ages for 180M rows. You may also speed up a bit using parallel hash join.
Upvotes: 2