Unexpected behaviour in aws Redshift

Question

I have observed some unexpected behaviour in AWS Redshift. Below are the examples that illustrate it.

This query:

WITH var_table AS (
SELECT
    8 AS x
)
SELECT * FROM real_table
WHERE real_table_col_1 = (select x from var_table)
AND real_table_col_2 = (select x from var_table)

fails with:

ERROR: Assert
  Detail: 
  -----------------------------------------------
  error:  Assert
  code:      1000
  context:   query->a_last_plan()->m_locus == LocusXNode - 
  query:     48508061
  location:  xen_execute.cpp:8916
  process:   padbmaster [pid=2659]
  -----------------------------------------------

This query works normally:

create temporary table var_table AS (
SELECT
    8 AS x
);

SELECT * FROM real_table
WHERE real_table_col_1 = (select x from var_table)
AND real_table_col_2 = (select x from var_table)

And also this query works normally:

WITH var_table AS (
SELECT
    8 AS x
)
SELECT * FROM real_table
WHERE real_table_col_1 = (select x from var_table)

Only difference between this one and the first one is that I extract x only once.

Is there any explanation for this and I just don't understand some nitty-gritties of Redshift or is it a legitimate bug?

EDIT:

The query format above is a simplified version which I thought was the minimal reproducible example. I have since found out that this issue doesn't happen in some simpler queries if I use the same pattern.

I will keep trying to find the minimal reproducible example for some time, but otherwise I will just keep the create temporary table for now.

Bill Weiner · Accepted Answer

Assert errors are due to some part of the query plan not making sense when the query is executed. Is it a bug? Possibly, but likely should be handled more gracefully.

I suspect that the error is arising due to the comparison of a value to a column with '=' and doing this multiple times and the table reference being a CTE (no meta data). The query planner is setting up for the general case for this type of comparison which could be mapped to an IN clause or a JOIN. As suspect it is trying to map to a pair of JOINs and things are going sideways.

A few things to try to see if it is such an issue:

Remove the unneeded second table scan (unnecessarily rescanning tables is not a best practice):

WITH var_table AS (
SELECT
    8 AS x
)
SELECT * FROM real_table
WHERE real_table_col_1 = (select x from var_table)
AND real_table_col_2 = real_table_col_1;

Tell the query planner that you only want 1 row from this table:

WITH var_table AS (
SELECT
    8 AS x
)
SELECT * FROM real_table
WHERE real_table_col_1 = (select top 1 x from var_table)
AND real_table_col_2 = (select top 1 x from var_table)

Move to an explicit JOIN syntax:

WITH var_table AS (
SELECT
    8 AS x
)
SELECT * FROM real_table rt
JOIN var_table vt
ON rt.real_table_col_1 = vt.x 
AND rt.real_table_col_2 = vt.x;

See if you can't recreate the issue by explicitly using redundant joins:

WITH var_table AS (
SELECT
    8 AS x
)
SELECT * FROM real_table rt
JOIN vart_table vt1 ON rt.real_table_col_1 = vt1.x
JOIN vart_table vt2 ON rt.real_table_col_2 = vt2.x;

I'm not confident that this last query will fail but I expect it is similar to what the query planner is trying to do with your original query.

Unexpected behaviour in aws Redshift

Answers (1)

Related Questions