Reputation: 786
I have observed some unexpected behaviour in AWS Redshift. Below are the examples that illustrate it.
This query:
WITH var_table AS (
SELECT
8 AS x
)
SELECT * FROM real_table
WHERE real_table_col_1 = (select x from var_table)
AND real_table_col_2 = (select x from var_table)
fails with:
ERROR: Assert
Detail:
-----------------------------------------------
error: Assert
code: 1000
context: query->a_last_plan()->m_locus == LocusXNode -
query: 48508061
location: xen_execute.cpp:8916
process: padbmaster [pid=2659]
-----------------------------------------------
This query works normally:
create temporary table var_table AS (
SELECT
8 AS x
);
SELECT * FROM real_table
WHERE real_table_col_1 = (select x from var_table)
AND real_table_col_2 = (select x from var_table)
And also this query works normally:
WITH var_table AS (
SELECT
8 AS x
)
SELECT * FROM real_table
WHERE real_table_col_1 = (select x from var_table)
Only difference between this one and the first one is that I extract x
only once.
Is there any explanation for this and I just don't understand some nitty-gritties of Redshift or is it a legitimate bug?
EDIT:
The query format above is a simplified version which I thought was the minimal reproducible example. I have since found out that this issue doesn't happen in some simpler queries if I use the same pattern.
I will keep trying to find the minimal reproducible example for some time, but otherwise I will just keep the create temporary table
for now.
Upvotes: 0
Views: 1724
Reputation: 11092
Assert errors are due to some part of the query plan not making sense when the query is executed. Is it a bug? Possibly, but likely should be handled more gracefully.
I suspect that the error is arising due to the comparison of a value to a column with '=' and doing this multiple times and the table reference being a CTE (no meta data). The query planner is setting up for the general case for this type of comparison which could be mapped to an IN clause or a JOIN. As suspect it is trying to map to a pair of JOINs and things are going sideways.
A few things to try to see if it is such an issue:
Remove the unneeded second table scan (unnecessarily rescanning tables is not a best practice):
WITH var_table AS (
SELECT
8 AS x
)
SELECT * FROM real_table
WHERE real_table_col_1 = (select x from var_table)
AND real_table_col_2 = real_table_col_1;
Tell the query planner that you only want 1 row from this table:
WITH var_table AS (
SELECT
8 AS x
)
SELECT * FROM real_table
WHERE real_table_col_1 = (select top 1 x from var_table)
AND real_table_col_2 = (select top 1 x from var_table)
Move to an explicit JOIN syntax:
WITH var_table AS (
SELECT
8 AS x
)
SELECT * FROM real_table rt
JOIN var_table vt
ON rt.real_table_col_1 = vt.x
AND rt.real_table_col_2 = vt.x;
See if you can't recreate the issue by explicitly using redundant joins:
WITH var_table AS (
SELECT
8 AS x
)
SELECT * FROM real_table rt
JOIN vart_table vt1 ON rt.real_table_col_1 = vt1.x
JOIN vart_table vt2 ON rt.real_table_col_2 = vt2.x;
I'm not confident that this last query will fail but I expect it is similar to what the query planner is trying to do with your original query.
Upvotes: 2