Redshift: is a join to a subquery/CTE consisting of SELECT * from a table equivalent to joining the table itself, or a performance hit?

Question

On Redshift, does a CTE/subquery used in a join incur a performance hit if it is doing a SELECT * from a source table, vs. code that just references and joins to the source table directly? That is, is there any difference in performance between this code:

WITH cte_source_2 AS (SELECT * FROM source_2)
SELECT
    s1.field_1, s2.field_2
FROM
    source_1 AS s1
LEFT JOIN
    cte_source_2 AS s2
    ON
        s1.key_field = s2.key_field

And this code:

SELECT
    s1.field_1, s2.field_2
FROM
    source_1 AS s1
LEFT JOIN
    source_2 AS s2
    ON
        s1.key_field = s2.key_field

I would think not, that the query optimizer would reduce the first version to the second, but am getting conflicting results (mostly I think due to caching).

Another way of phrasing this question is, CTEs aside, and on Redshift specifically, does this:

SELECT
    .....
FROM
    (SELECT * FROM source_1) AS s1
LEFT JOIN
    .......

Perform the same as this:

SELECT
    .....
FROM
    source_1 AS s1
LEFT JOIN
    .......

Unfortunately I do not have the kind of access to get any profiling info. Thanks!

Jon Scott · Accepted Answer

On Redshift, cte's are great for convenience but the query still resolves as a sub-select. See second paragraph of https://docs.aws.amazon.com/redshift/latest/dg/r_WITH_clause.html

Because of that, you are correct. Performance will be the same either way.

This is not the case on postgres where cte's are resolved as temp tables. See first paragraph of https://www.postgresql.org/docs/current/queries-with.html

Redshift: is a join to a subquery/CTE consisting of SELECT * from a table equivalent to joining the table itself, or a performance hit?

Answers (1)

Related Questions