Reputation: 53
I have three tables, the first is a list of email addresses:
addresses:
id - integer, this is the primary key<br>
email - varchar(255) field holding the address
sent:
sid - integer, foreign key references id in addresses table
received:
rid - integer, foreign key references id in addresses table
Obviously the "sent" and "received" tables have other columns, but they are not important for this question. the sent and received tables are populated every time an email is sent or received and if the address is not already in the "addresses" table, it gets added. The tables can get quite large (100,000+).
Entries for the "sent" and "received" tables are purged on a regular basis and entries removed for various reasons, leaving orphaned entries in the "addresses" table.
I am looking for the most efficient method in MySQL to purge orphaned entries in the "addresses" table. The query I have so far is:
delete
from addresses
where id not in
(select rid from received)
and id not in
(select sid from sent);
This works, but it can take a looong time to run and is definitely not the most efficient way of doing this! I also tried this:
delete
from addresses
where not exists
(select 'x' from sent where sent.sid=addresses.id)
and not exists
(select 'x' from rceieved where recieved.rid=addresses.id);
This was a bit quicker, but still takes a long time, I suspect I need to use the JOIN syntax but my sql knowledge has run out on me at this point !
Upvotes: 0
Views: 858
Reputation: 1178
Did some testing using 2 300k myisam tables which contained 2 id columns (and several other non-identical columns). The ids were identical except for 2 records in one table. Tried the 3 methods mentioned to find these ids:
WHERE NOT EXISTS
LEFT JOIN
IN ()
Making sure to use SQL_NO_CACHE and all queries performed identically, the server returned the two results in ~14.6 seconds.
The differences mentioned above must either be caching, differing versions of mysql and/or general server configuration.
Upvotes: 0
Reputation: 1
I'm sorry I can't really give a definite answer. But I had a similar problem, and after looking around it seems there are only two main choices:
WHERE x NOT IN y
LEFT JOIN x ON y WHERE z IS NULL
I tried both methods by comparing two tables, of 2822291 and 916626 records respectively.
The performance conclusions are as follows:
Thus, concluding, as of now (08-2013) is seems that option 1 is still the faster way to go. Using NOT EXISTS
might be even faster, but performance changes there aren't dramatic compared to type 1.
I hope this helps anyone out eventually.
Upvotes: 0
Reputation: 5605
This should do the trick
DELETE adresses.* FROM adresses
LEFT JOIN sent ON sent.sid=adresses.id
LEFT JOIN received ON received.rid=adresses.id
WHERE sent.sid IS NULL AND received.rid IS NULL
Upvotes: 1
Reputation: 13465
Try this: delete from adresses a left join sent s on (a.sentid=s.id) where s.id is null
Upvotes: 0