Exitos
Exitos

Reputation: 29720

How to delete something not in the join of 2 tables?

Ive got a problem where I want to delete orphaned records. I was wondering what the syntax is for deleting records that aren't in a join.

So if my query to get the stuff (that I dont want to delete is):

select * from tbl_user tu
inner join tbl_user_group_xref tugx on tu.userid=tugx.userid

Then how do I 1) get the stuff that isnt in the clause and 2) delete it?

Like to do it without using arrays but an array solution would still be useful for learning purposes.

Upvotes: 2

Views: 7371

Answers (5)

user213154
user213154

Reputation:

There's an optimization to Duncan Howe's answer that I know works in MySQL and may work with other servers. It probably also works for t-clausen.dk's answer in MySQL.

If you are deleting rows from table t1 that don’t have corresponding rows in t2 and both tables are very large then the server can end up getting swamped with disk seeks. I found that performance can be improved a lot if you can force the server to load t2's index into memory before running the query and then, in the query, force the server to ignore t1's index. That makes the server do a sequential scan of t1, which will be an efficient use of disk. The server steps through each row of t1 looking up t2's index, which is in memory, to determine if the row should be deleted. The disk seeks are thus eliminated and disk IO rate is very high, which keeps the CPU busy.

For example:

delete tbl_user 
  from tbl_user tu ignore key (primary)
  left join tbl_user_group_xref tugx 
    use key (userid) on tu.userid=tugx.userid
where tugx.userid is null

(I'm assuming that tbl_user.userid is its table's PK and the index on tbl_user_group_xref.userid is named userid. If not, change the respective key names.)

Forcing a server to load an index into memory is technology-specific. In MySQL for MyISAM tables you can use load index into cache. Recreating an index from scratch (which is very fast in MySQL) might leave it in cache (and would have the nice side effect of balancing the B-tree).

I've seen examples with well over 100x improvement using this optimization. So long as you can cache t2's index, you can process very large tables efficiently.

Upvotes: 4

t-clausen.dk
t-clausen.dk

Reputation: 44316

This delete statement performs optimal as well as the script from Duncan Howe. I gave Duncan Howe a plus because it is correct and I didn't know that syntax.

delete tu
from tbl_user tu 
where not exists (select 1 from tbl_user_group_xref where userid = tu.userid)

Upvotes: 2

Duncan Howe
Duncan Howe

Reputation: 3025

Try

delete tbl_user 
  from tbl_user tu 
  left join tbl_user_group_xref tugx on tu.userid=tugx.userid 
where tugx.userid is null

Upvotes: 10

Roy T.
Roy T.

Reputation: 9638

You can use the NOT IN statement like described here http://www.techonthenet.com/sql/in.php

Basically you would write a select query from the table where you would like to delete records from and then perform a NOT IN on a subquery that joins the two tables.

I don't know if this is very good performance wise though.

Edit: basically exactly what Heximal says.

Upvotes: 1

heximal
heximal

Reputation: 10517

e.g.:

delete from tbl_user tu
where 
  user_id not in (
    select 
      user_id 
    from 
      tbl_user tu
    inner join 
      tbl_user_group_xref tugx on tu.userid=tugx.userid
) 

Upvotes: 6

Related Questions