Reputation: 10843
On Redshift, does a CTE/subquery used in a join incur a performance hit if it is doing a SELECT *
from a source table, vs. code that just references and joins to the source table directly? That is, is there any difference in performance between this code:
WITH cte_source_2 AS (SELECT * FROM source_2)
SELECT
s1.field_1, s2.field_2
FROM
source_1 AS s1
LEFT JOIN
cte_source_2 AS s2
ON
s1.key_field = s2.key_field
And this code:
SELECT
s1.field_1, s2.field_2
FROM
source_1 AS s1
LEFT JOIN
source_2 AS s2
ON
s1.key_field = s2.key_field
I would think not, that the query optimizer would reduce the first version to the second, but am getting conflicting results (mostly I think due to caching).
Another way of phrasing this question is, CTEs aside, and on Redshift specifically, does this:
SELECT
.....
FROM
(SELECT * FROM source_1) AS s1
LEFT JOIN
.......
Perform the same as this:
SELECT
.....
FROM
source_1 AS s1
LEFT JOIN
.......
Unfortunately I do not have the kind of access to get any profiling info. Thanks!
Upvotes: 0
Views: 3078
Reputation: 4354
On Redshift, cte's are great for convenience but the query still resolves as a sub-select. See second paragraph of https://docs.aws.amazon.com/redshift/latest/dg/r_WITH_clause.html
Because of that, you are correct. Performance will be the same either way.
This is not the case on postgres where cte's are resolved as temp tables. See first paragraph of https://www.postgresql.org/docs/current/queries-with.html
Upvotes: 1