Reputation: 4899
I'm experiencing a weird beahviour on a REST api endpoint. Basically I have two tables, a user table with an id and other relavant fields, and an action table with a uid (which maps the user ids) and several other fields.
I want to extract the user that has performed a particular action and I'm doing something like this:
SELECT * FROM users where id IN (select uid from action WHERE [CONDITIONS] order by [CRITERIA]);
Given the size of my DB this query runs in about two seconds, which is completely unacceptable for my use case.
The weird behaviour comes if I split the query in two subqueries, executing first:
select uid from action WHERE [CONDITIONS] order by [CRITERIA];
Than after having manually concatenated the string to be matched by the IN operator executing:
SELECT * FROM users where id IN [MANUAL CONCAT];
The two queries runs both under 5ms, on the same dataset.
It was my (possibly wrong) understanding that the subquery would have been executed first and then the main query afterwards. Is it wrong? Does MySQL perform the IN subquery EACH time?
UPDATE
If I simply use a join (see code below) is a lot faster (about 10ms), but I still don't get how IN works here.
SELECT distinct * FROM users join action on users.id = action.uid where [CONDITIONS];
I suspect that the fact that the rows actually matched in action are about 5-10 out of 5M counts something.
Upvotes: 0
Views: 40
Reputation: 142298
IN ( SELECT ... )
is very poorly optimized -- the SELECT
is repeatedly evaluated.
In some new version the SELECT
will be materialized and an INDEX
will be automatically generated. Still, a JOIN
is very likely to continue to be faster.
Look at the EXPLAIN SELECT ...
; it may give some clues of what is (or is not) going on. If you would like further discussion, please provide EXPLAIN
, the full SELECT
and SHOW CREATE TABLE
for the table(s).
Upvotes: 1
Reputation: 1269703
First, order by
for in
is not useful, so you can rewrite it without an order by
.
Second, exists
is usually faster than in
:
SELECT u.*
FROM users u
WHERE EXISTS (SELECT 1 FROM action a WHERE a.uid = u.id and [CONDITIONS]);
For optimal performance you want an index on action(uid, . . .)
. The . . .
is for other columns needed for the [CONDITIONS]
.
Upvotes: 1