Reputation: 2505
I'm performing a query that is looking for values in one table that are not in another. For example:
SELECT id FROM table1 WHERE id NOT IN (SELECT id FROM table2);
Both tables have about 1 million rows but only a few hundred that do not match values which are returned in the result set. The query takes about 35 seconds. Doing a show profile on the query shows that mysql is spending most of the time in the state of "preparing". Any ideas on how I can optimize this state or what is actually happening during "preparing"?
The id value in both tables is indexed and of the same type and size.
The entire profile of the query is:
+--------------------------------+----------+
| Status | Duration |
+--------------------------------+----------+
| (initialization) | 0 |
| checking query cache for query | 0 |
| Opening tables | 0.13 |
| System lock | 0 |
| Table lock | 0 |
| init | 0.01 |
| optimizing | 0 |
| statistics | 0 |
| preparing | 0 |
| executing | 0 |
| Sending data | 0 |
| optimizing | 0 |
| statistics | 0 |
| preparing | 34.83 |
| end | 0 |
| query end | 0 |
| freeing items | 0 |
| closing tables | 0 |
| logging slow query | 0 |
+--------------------------------+----------+
Any tips are appreciated.
Thanks.
Upvotes: 2
Views: 2822
Reputation: 332581
There's nothing to optimize - NOT IN
produces an equivalent query plan to LEFT JOIN/IS NULL
in MySQL. Quote:
However, these three methods generate three different plans which are executed by three different pieces of code. The code that executes EXISTS predicate is about 30% less efficient than those that execute index_subquery and LEFT JOIN optimized to use Not exists method.
That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.
For more information, see NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL
Upvotes: 2
Reputation: 300559
SELECT id FROM table1
LEFT JOIN table2 ON table1.id = table2.id
WHERE table2.id IS NULL;
Upvotes: 1
Reputation: 15958
I would left join table2 on id where id is null. This would give you a much faster return of the data.
select
a.id
from
table1 a
left join table2 b on a.id = b.id and b.id is null
Upvotes: 3