BCS
BCS

Reputation: 78516

Mysql select where not in table

I have 2 tables (A and B) with the same primary keys. I want to select all row that are in A and not in B. The following works:

select * from A where not exists (select * from B where A.pk=B.pk);

however it seems quite bad (~2 sec on only 100k rows in A and 3-10k less in B)

Is there a better way to run this? Perhaps as a left join?

select * from A left join B on A.x=B.y where B.y is null;

On my data this seems to run slightly faster (~10%) but what about in general?

Upvotes: 59

Views: 74599

Answers (5)

user3136147
user3136147

Reputation: 17

This helped me a lot. Joins are always faster than Sub Queries to give results:

SELECT tbl1.id FROM tbl1 t1
LEFT OUTER JOIN tbl2 t2 ON t1.id = t2.id 
WHERE t1.id>=100 AND t2.id IS NULL ;

Upvotes: -2

ChoNuff
ChoNuff

Reputation: 822

Joins are generally faster (in MySQL), but you should also consider your indexing scheme if you find that it's still moving slowly. Generally, any field setup as a foreign key (using INNODB) will already have an index set. If you're using MYISAM, make sure that any columns in the ON statement are indexed, and consider also adding any columns in the WHERE clause to the end of the index, to make it a covering index. This allows the engine to have access to all the data needed in the index, removing the need to make a second round-trip back to the original data. Keep in mind that this will impact the speed of inserts/updates/deletes, but can significantly increase the speed of the query.

Upvotes: 2

Nick Berardi
Nick Berardi

Reputation: 54854

I think your last statement is the best way. You can also try

SELECT A.*    
from A left join B on 
    A.x = B.y
    where B.y is null

Upvotes: 63

Dave Rix
Dave Rix

Reputation: 1679

I also use left joins with a "where table2.id is null" type criteria.

Certainly seems to be more efficient than the nested query option.

Upvotes: 2

Bill Karwin
Bill Karwin

Reputation: 562260

I use queries in the format of your second example. A join is usually more scalable than a correlated subquery.

Upvotes: 37

Related Questions