chadgh
chadgh

Reputation: 9753

MySQL performance using IN predicate

If I run the following queries each one returns quickly (0.01 sec) and gives me my desired result.

SELECT tagId FROM tag WHERE name='programming'

SELECT COUNT(DISTINCT workcode) FROM worktag WHERE tagId=123 OR tagId=124

(assume the two tagId numbers were the results from the first query)

I would like to combine these queries so I only have to run it once:

SELECT COUNT(DISTINCT workcode) FROM worktag WHERE tagId IN (SELECT tagId FROM tag WHERE name='programming')

However this query completes in about 1 min and 20 sec. I have indexes on worktag.workcode, worktag.tagId, tag.tagId, and tag.name.

If I run DESCRIBE on the queries the first two use the indexes and the second one uses the index for the subquery (on the tag table) but doesn't use any indexes on the worktag table.

Does anyone know why this might be?

NOTE: the worktag table has over 18 million records in it.

Upvotes: 2

Views: 1119

Answers (6)

user187477
user187477

Reputation: 11

MySQL generally doesn't do so well with subqueries, even independent ones. The posters who discussed joins are right - if you've got a choice, use a join. If you can't easily use a join (ie, foo.x in (select y from bar where y = xxx limit 10)), you're better off running the limit into a temporary IN MEMORY table and using a join on it.

If you're using MySQL a lot, use EXPLAIN and you'll see how it's using your indexes and such things.

Upvotes: 1

erikkallen
erikkallen

Reputation: 34401

I guess the optimizer does some bad guess. Replacing the query with an inner join might help.

Upvotes: 0

Gratzy
Gratzy

Reputation: 9389

SELECT COUNT(DISTINCT workcode) 
FROM worktag 
inner join tag on worktag.tagid = tag.tagid
WHERE tag.name='programming'

Upvotes: 1

tangens
tangens

Reputation: 39733

A database admin told me recently, that the syntax WHERE x IN ( ... ) is a pain for the database. A join is almost always better:

SELECT COUNT(DISTINCT wt.workcode) 
  FROM worktag wt, tag t 
 WHERE wt.tagId = t.tagId 
   AND t.name='programming'

Upvotes: 1

Zed
Zed

Reputation: 57658

Why don't you use a join instead of a subquery?

SELECT COUNT(DISTINCT workcode)
FROM worktag
LEFT JOIN tag
  ON worktag.tagId = tag.tagID
WHERE tag.name = 'programming'

P.S.: Seems to be reported as bug.

Upvotes: 2

Larry Lustig
Larry Lustig

Reputation: 50970

Have you tried:

SELECT COUNT(DISTINCT workcode) FROM worktag WHERE tagId IN (123, 124)

?

I'm not a MySQL expert, but it looks to me like you might be looking at a significant failure of the query optimizer.

On the other had, good for MySQL that it optimizes the OR in the second statement. I know databases that will successfully optimize the IN (), but not the OR version of the same logical request.

Upvotes: 0

Related Questions