Reputation: 870
Given posts
and posts_tags
:
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
external_id varchar(10) UNIQUE,
title varchar(255),
body TEXT
)
CREATE TABLE posts_tags (
id SERIAL PRIMARY KEY,
post_external_id varchar(10) REFERENCES posts(external_id),
tag varchar(255),
UNIQUE KEY (post_external_id, tag)
)
As you can see the above tables store posts and their tags for a backend serving REST APIs.
There is a get-post-by-external-id
API that will return the post with its tags identified by the post's external_id
. Given this use case, I'm wondering which query below executes faster and consumes less resources?
SELECT posts.*, posts_tags.tag
FROM (SELECT * FROM posts WHERE external_id = ?) AS posts
LEFT JOIN posts_tags ON posts.external_id = posts_tags.post_external_id
SELECT posts.*, posts_tags.tag
FROM posts
LEFT JOIN posts_tags ON posts.external_id = posts_tags.post_external_id
WHERE posts.external_id = ?
I'm not sure, but I guess the WHERE
filters the intermediate results after the LETF JOIN
happens?
Upvotes: 1
Views: 274
Reputation: 31812
There is no need for a subquery. The engine doesn't evaluate/execute the query in the same order as you write. The optimizer will generate an "execution path" as optimal as possible (though it doesn't work perfect). However for this particular case it's documented in WHERE Clause Optimization:
For each table in a join, a simpler
WHERE
is constructed to get a fastWHERE
evaluation for the table and also to skip rows as soon as possible.
So your guess:
I guess the WHERE filters the intermediate results after the LETF JOIN happens?
is wrong. Given an index on posts.external_id
the engine will only "touch" the matching row(s) and perform the join with posts_tags
only for that/those row(s).
And from my experience I can also tell you, that the second (trivial) query is perfectly fine performance wise.
I would though not use that query for another reason. You get the post data duplicated per related tag. Consider you have a post containing a 10KB text and having 10 tags. You will get 10 rows (one per tag) with the same text and fetch together 100KB of data. Then you will need to write code to "deduplicate" the data.
I would rather either run two separate queries:
SELECT posts.*
FROM posts
WHERE posts.external_id = ?
and
SELECT posts_tags.tag
FROM posts_tags
WHERE posts_tags.post_external_id = ?
and "merge" the result in application code
or aggregate the tags with GROUP_CONCAT()
SELECT posts.*, GROUP_CONCAT(posts_tags.tag) as tags
FROM posts
LEFT JOIN posts_tags ON posts.external_id = posts_tags.post_external_id
WHERE posts.external_id = ?
GROUP BY posts.external_id
or with JSON_ARRAYAGG()
SELECT posts.*, JSON_ARRAYAGG(posts_tags.tag) as tags
FROM posts
LEFT JOIN posts_tags ON posts.external_id = posts_tags.post_external_id
WHERE posts.external_id = ?
GROUP BY posts.external_id
Parsing a comma delimited list or a JSON array should be simple in any application language.
Upvotes: 1