elbuild
elbuild

Reputation: 4899

MySQL IN with subquery performance weird behaviour

I'm experiencing a weird beahviour on a REST api endpoint. Basically I have two tables, a user table with an id and other relavant fields, and an action table with a uid (which maps the user ids) and several other fields.

I want to extract the user that has performed a particular action and I'm doing something like this:

SELECT * FROM users where id IN (select uid from action WHERE [CONDITIONS] order by [CRITERIA]);

Given the size of my DB this query runs in about two seconds, which is completely unacceptable for my use case.

The weird behaviour comes if I split the query in two subqueries, executing first:

select uid from action WHERE [CONDITIONS] order by [CRITERIA];

Than after having manually concatenated the string to be matched by the IN operator executing:

SELECT * FROM users where id IN [MANUAL CONCAT];

The two queries runs both under 5ms, on the same dataset.

It was my (possibly wrong) understanding that the subquery would have been executed first and then the main query afterwards. Is it wrong? Does MySQL perform the IN subquery EACH time?

UPDATE

If I simply use a join (see code below) is a lot faster (about 10ms), but I still don't get how IN works here.

SELECT distinct * FROM users join action on users.id = action.uid where [CONDITIONS];

I suspect that the fact that the rows actually matched in action are about 5-10 out of 5M counts something.

Upvotes: 0

Views: 40

Answers (2)

Rick James
Rick James

Reputation: 142298

IN ( SELECT ... ) is very poorly optimized -- the SELECT is repeatedly evaluated.

In some new version the SELECT will be materialized and an INDEX will be automatically generated. Still, a JOIN is very likely to continue to be faster.

Look at the EXPLAIN SELECT ...; it may give some clues of what is (or is not) going on. If you would like further discussion, please provide EXPLAIN, the full SELECT and SHOW CREATE TABLE for the table(s).

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269703

First, order by for in is not useful, so you can rewrite it without an order by.

Second, exists is usually faster than in:

SELECT u.*
FROM users u
WHERE EXISTS (SELECT 1 FROM action a WHERE a.uid = u.id and [CONDITIONS]);

For optimal performance you want an index on action(uid, . . .). The . . . is for other columns needed for the [CONDITIONS].

Upvotes: 1

Related Questions