Reputation: 29720
Ive got a problem where I want to delete orphaned records. I was wondering what the syntax is for deleting records that aren't in a join.
So if my query to get the stuff (that I dont want to delete is):
select * from tbl_user tu
inner join tbl_user_group_xref tugx on tu.userid=tugx.userid
Then how do I 1) get the stuff that isnt in the clause and 2) delete it?
Like to do it without using arrays but an array solution would still be useful for learning purposes.
Upvotes: 2
Views: 7371
Reputation:
There's an optimization to Duncan Howe's answer that I know works in MySQL and may work with other servers. It probably also works for t-clausen.dk's answer in MySQL.
If you are deleting rows from table t1 that don’t have corresponding rows in t2 and both tables are very large then the server can end up getting swamped with disk seeks. I found that performance can be improved a lot if you can force the server to load t2's index into memory before running the query and then, in the query, force the server to ignore t1's index. That makes the server do a sequential scan of t1, which will be an efficient use of disk. The server steps through each row of t1 looking up t2's index, which is in memory, to determine if the row should be deleted. The disk seeks are thus eliminated and disk IO rate is very high, which keeps the CPU busy.
For example:
delete tbl_user
from tbl_user tu ignore key (primary)
left join tbl_user_group_xref tugx
use key (userid) on tu.userid=tugx.userid
where tugx.userid is null
(I'm assuming that tbl_user.userid
is its table's PK and the index on tbl_user_group_xref.userid
is named userid
. If not, change the respective key names.)
Forcing a server to load an index into memory is technology-specific. In MySQL for MyISAM tables you can use load index into cache
. Recreating an index from scratch (which is very fast in MySQL) might leave it in cache (and would have the nice side effect of balancing the B-tree).
I've seen examples with well over 100x improvement using this optimization. So long as you can cache t2's index, you can process very large tables efficiently.
Upvotes: 4
Reputation: 44316
This delete statement performs optimal as well as the script from Duncan Howe. I gave Duncan Howe a plus because it is correct and I didn't know that syntax.
delete tu
from tbl_user tu
where not exists (select 1 from tbl_user_group_xref where userid = tu.userid)
Upvotes: 2
Reputation: 3025
Try
delete tbl_user
from tbl_user tu
left join tbl_user_group_xref tugx on tu.userid=tugx.userid
where tugx.userid is null
Upvotes: 10
Reputation: 9638
You can use the NOT IN statement like described here http://www.techonthenet.com/sql/in.php
Basically you would write a select query from the table where you would like to delete records from and then perform a NOT IN on a subquery that joins the two tables.
I don't know if this is very good performance wise though.
Edit: basically exactly what Heximal says.
Upvotes: 1
Reputation: 10517
e.g.:
delete from tbl_user tu
where
user_id not in (
select
user_id
from
tbl_user tu
inner join
tbl_user_group_xref tugx on tu.userid=tugx.userid
)
Upvotes: 6