Michi
Michi

Reputation: 5471

Invalid operation error with variables from CTE

DB-Fiddle

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    order_date DATE,
    customer VARCHAR(255)
);

INSERT INTO customers
(order_date, customer)
VALUES 
('2020-04-10', 'user_01'),
('2020-04-15', 'user_02'),
('2020-05-18', 'user_03'),
('2020-05-26', 'user_04'),
('2020-06-03', 'user_05'),
('2020-06-05', 'user_06'),
('2020-06-24', 'user_07');

I am extracting the users per month with this query:

SELECT 
c.customer
FROM customers c
WHERE c.order_date BETWEEN '2020-06-01' AND '2020-06-30'

This query works without any issue.


Now I wanted to use the dates as variables and therefore switched to:

WITH time_variables AS 
(SELECT 
'2020-06-01'::date AS start_date,
'2020-06-30'::date AS end_date)

SELECT 
c.customer
FROM customers c
WHERE c.order_date BETWEEN (SELECT start_date FROM time_variables) AND (SELECT end_date FROM time_variables)

However, when I run this query I get error:

Invalid operation: This type of correlated subquery pattern is not supported due to internal error;

Do you have any idea why this error happens in amazon-redshift?
Is there any other way to use the dates as variable?

Upvotes: 0

Views: 121

Answers (3)

Bill Weiner
Bill Weiner

Reputation: 11032

This is happening because the query you have written uses a correlated subquery which is when a SELECT in a clause of another SELECT (like the SELECTs in your WHERE clause) can (possibly) resolve to different values for each row of the outer SELECT. In your case your CTE has only one row so can only produce the same values but the query structure doesn't require this. Correlated subquery structures are not supported on Redshift b/c resolving these correlations on massive data become prohibitively expensive (and therefore slow).

To resolve change the query to use JOIN. Like this:

WITH time_variables AS 
(SELECT 
'2020-06-01'::date AS start_date,
'2020-06-30'::date AS end_date)

SELECT 
c.customer
FROM customers c
CROSS JOIN time_variables v
WHERE c.order_date BETWEEN v.start_date AND v.end_date

This just joins the one row of the CTE to every row in customer. Now it is clear how start and end dates are combined with the data in customer. However if you are looking to put variables in your SQL there are better ways that don't include creating new tables structures.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269783

Does it work using join?

WITH time_variables AS (
      SELECT '2020-06-01'::date AS start_date, '2020-06-30'::date AS end_date
     )
SELECT c.customer
FROM customers c JOIN
     time_variables tv
     ON c.order_date BETWEEN tv.start_date AND tv.end_date;

Upvotes: 1

Karlheim
Karlheim

Reputation: 121

To use Variable you could use DECLARE

DECLARE @StartDate DATE = Datefromparts(2020,06,01)
DECLARE @EndDate DATE= Datefromparts(2020,06,30)

SELECT 
c.customer
FROM customers c
WHERE c.order_date BETWEEN @StartDate AND @EndDate

Upvotes: 0

Related Questions