Michael Shnitzer
Michael Shnitzer

Reputation: 2505

MySQL Query Optimization

I'm performing a query that is looking for values in one table that are not in another. For example:

SELECT id FROM table1 WHERE id NOT IN (SELECT id FROM table2);

Both tables have about 1 million rows but only a few hundred that do not match values which are returned in the result set. The query takes about 35 seconds. Doing a show profile on the query shows that mysql is spending most of the time in the state of "preparing". Any ideas on how I can optimize this state or what is actually happening during "preparing"?

The id value in both tables is indexed and of the same type and size.

The entire profile of the query is:

+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| (initialization)               | 0        | 
| checking query cache for query | 0        | 
| Opening tables                 | 0.13     | 
| System lock                    | 0        | 
| Table lock                     | 0        | 
| init                           | 0.01     | 
| optimizing                     | 0        | 
| statistics                     | 0        | 
| preparing                      | 0        | 
| executing                      | 0        | 
| Sending data                   | 0        | 
| optimizing                     | 0        | 
| statistics                     | 0        | 
| preparing                      | 34.83    | 
| end                            | 0        | 
| query end                      | 0        | 
| freeing items                  | 0        | 
| closing tables                 | 0        | 
| logging slow query             | 0        | 
+--------------------------------+----------+

Any tips are appreciated.

Thanks.

Upvotes: 2

Views: 2822

Answers (3)

OMG Ponies
OMG Ponies

Reputation: 332581

There's nothing to optimize - NOT IN produces an equivalent query plan to LEFT JOIN/IS NULL in MySQL. Quote:

However, these three methods generate three different plans which are executed by three different pieces of code. The code that executes EXISTS predicate is about 30% less efficient than those that execute index_subquery and LEFT JOIN optimized to use Not exists method.

That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.

For more information, see NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL

Upvotes: 2

Mitch Wheat
Mitch Wheat

Reputation: 300559

SELECT id FROM table1
LEFT JOIN table2 ON table1.id = table2.id
WHERE table2.id IS NULL;

Upvotes: 1

Avitus
Avitus

Reputation: 15958

I would left join table2 on id where id is null. This would give you a much faster return of the data.

select
     a.id
from
     table1 a 
     left join table2 b on a.id = b.id and b.id is null

Upvotes: 3

Related Questions