Rudie
Rudie

Reputation: 53821

Which subquery is faster?

(Probably a duplicate but I can only find questions and solutions with the JOIN [3] and that's not an option.)

I have two tables. Both very thin (few columns) and very long (many rows). One is the data table (articles) and one is the ACL table (acl).

I want to show only the articles I have access to via acl.some_id. Which subquery is faster?

[1]
SELECT a.title
FROM articles a
WHERE 0 < (
  SELECT COUNT(1)
  FROM acl
  WHERE article_id = a.id AND some_id IN (1, 2, 3)
)

or

[2]
SELECT a.title
FROM articles a
WHERE a.id IN (
  SELECT article_id
  FROM acl WHERE some_id IN (1, 2, 3)
)

My mind would say the second one, because that subquery can be reused for all potentially matching row, so will only be executed once (although the result set will be very large), while the subquery in the first will have to check for EVERY potentially matching row.

There's a third way, but that's not an option, because it would duplicate rows (and GROUP BY is not the solution because I need a COUNT for something else later (and DISTINCT is never a solution!)):

[3]
SELECT a.title
FROM articles a
JOIN acl
  ON acl.article_id = a.id
WHERE acl.some_id IN (1, 2, 3)

Since article_id X exists N times in acl, it would return that row 0 - N times instead of 0 - 1.

There's also a fourth way: EXISTS. Thanks to ypercube.

Related:

Upvotes: 1

Views: 165

Answers (1)

ypercubeᵀᴹ
ypercubeᵀᴹ

Reputation: 115530

I would say [2], too, but MySQL has some blind spots in optimizing IN subqueries, at least up to 5.5. There are several improvements of the query optimizer in the (newly released) 5.6 version. You can read about (semijoins and IN subqueries) in the MySQL docs: MySQL 5.6: Optimizing Subqueries with Semi-Join Transformations.

There are also several improvements of the optimizer in MariaDB (versions 5.3 and 5.5), and some are related to this kind of queries. You can read about in their docs: MariaDB 5.3: Semi-join subquery optimizations.

You can also try the EXISTS version, especially if you are using 5.5 or older version:

-- [4]
SELECT id
FROM articles AS a
WHERE EXISTS (
  SELECT *
  FROM acl 
  WHERE acl.some_id IN (1, 2, 3)
    AND acl.article_id = a.id 
) ;

I think an index on (article_id, some_id) will be useful here - or perhaps the reverse one, it doesn't hurt to try both.


If there is foreign key from acl (article_id) REFERENCES article (id) that you can trust, and you only need the article IDs, you can also get data from one table only:

SELECT DISTINCT article_id
FROM acl 
WHERE acl.some_id IN (1, 2, 3) ;

Of course you should test the several versions you have in your server, the version of MySQL you have (or plan to use), with your data distribution, and of course with large enough sizes of tables. Testing with a few hundred rows will not tell you much.

Upvotes: 5

Related Questions