Nicole L.
Nicole L.

Reputation: 106

Is this a bug, or does Snowflake not fully support correlated subqueries in a WHERE EXISTS clause?

Snowflake is throwing an error for an EXISTS clause if a filter condition depends on coalescing columns from both the outer table and the subquery table. The query will run if I remove the outer-table column from the COALESCE or replace the COALESCE with the long-form equivalent logic.

I'm seeing this error, specifically SQL compilation error: Unsupported subquery type cannot be evaluated, for what I would consider to be a fairly straightforward WHERE EXISTS clause. This would work in every (recent) SQL variant that I've used (e.g., SQL Server, Postgres), so I'm a little concerned that Snowflake doesn't support it. Am I missing something?

I found what seems to be a similar question at Snowflake's Community in 2019, where Snowflake was failing when the EXISTS clause included a WHERE filter condition that referenced a column from an outer query for something other than joining the tables. There was not a clear solution there.

Snowflake's documentation on its limited support for subqueries says that it supports both correlated and uncorrelated subqueries for "EXISTS, ANY / ALL, and IN subqueries in WHERE clauses".

So why is it failing on this EXISTS clause? Is what I'm seeing a bug, or is this a Snowflake limitation that is not clearly documented?

Code to reproduce the issue:

CREATE OR REPLACE TEMPORARY TABLE Employee (
   Emp_SK INT NOT NULL
);

CREATE OR REPLACE TEMPORARY TABLE Employee_X_Pay_Rate (
   Emp_SK INT NOT NULL, Pay_Rate_SK INT NOT NULL, Start_Date TIMESTAMP_NTZ NOT NULL, End_Date TIMESTAMP_NTZ NOT NULL
);

CREATE OR REPLACE TEMPORARY TABLE Employee_X_Location (
   Emp_SK INT NOT NULL, Location_SK INT NOT NULL, Start_Date TIMESTAMP_NTZ NOT NULL, End_Date TIMESTAMP_NTZ NULL
);
INSERT INTO Employee
VALUES (1);

INSERT INTO Employee_X_Pay_Rate 
VALUES 
    (1, 1, '2018-01-01', '2019-03-31')
   ,(1, 2, '2019-04-01', '2021-03-31')
   ,(1, 3, '2021-04-01', '2099-12-31')
;

INSERT INTO Employee_X_Location
VALUES
    (1, 101, '2018-01-01', '2019-12-31')
   ,(1, 102, '2020-01-01', '2020-12-31')
   ,(1, 103, '2021-01-01', NULL)
;
SET Asof_Date = TO_DATE('2021-05-31', 'yyyy-mm-dd'); -- changing this to TO_TIMESTAMP makes no difference
SELECT 
   emp.Emp_SK
   ,empPay.Pay_Rate_SK
   ,$Asof_Date AS Report_Date
   ,empPay.Start_Date AS Pay_Start_Date
   ,empPay.End_Date AS Pay_End_Date
FROM Employee emp
   INNER JOIN Employee_X_Pay_Rate empPay
      ON emp.Emp_SK = empPay.Emp_SK
      AND $Asof_Date BETWEEN empPay.Start_Date AND empPay.End_Date
WHERE EXISTS (
   SELECT 1 FROM Employee_X_Location empLoc
   WHERE emp.Emp_SK = empLoc.Emp_SK
      -- Issue: Next line fails. empLoc.End_Date can be null
      AND $Asof_Date BETWEEN empLoc.Start_Date AND COALESCE(empLoc.End_Date, empPay.End_Date)
);

The query will run if I replace the issue line with either of the following.

-- Workaround 1
AND (
   $Asof_Date >= empLoc.Start_Date
   AND ($Asof_Date <= empLoc.End_Date OR (empLoc.End_Date IS NULL AND $Asof_Date <= empPay.End_Date))
)

-- Workaround 2
AND $Asof_Date BETWEEN empLoc.Start_Date AND COALESCE(empLoc.End_Date, CURRENT_DATE)

Upvotes: 9

Views: 5456

Answers (2)

Happygolucky
Happygolucky

Reputation: 111

I have been looking into this and it looks like Snowflake supports correlated subquery in where clause if it is convinced that the inner subquery returns a scalar result. So in short, using functions like COUNT, ANY_VALUE or DISTINCT can give the result. Consider the following query -

SELECT * FROM Department D 
INNER JOIN  (select * from Employee) E
ON D.DepartmentID = E.DepartmentID 
where EXISTS( SELECT **distinct**  1 FROM EMPLOYEES E1 WHERE E1.DEPARTMENTID=D.DEPARTMENTID );    

(ignore the similar tables, they were used for testing). My requirement was to have TOP 1 in the WHERE EXISTS clause. However, since Snowflake does not support it, I managed the same using DISTINCT.

That being said, your query might work with the same WHERE EXISTS(SELECT DISTINCT 1.... as you need just to know whether records exist or not.

References to [Snowflake Community Forum-]:(https://community.snowflake.com/s/question/0D53r00009mIxwYCAS/sql-compilation-error-unsupported-subquery-type-cannot-be-evaluated?t=1626899830543) (https://community.snowflake.com/s/question/0D50Z00008BDZz0SAH/subquery-in-select-clause)

Upvotes: 1

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25893

I see this still happens, and I just notice you already know about swapping empPay.End_Date to CURRENT_DATE which is how I would have written it.

It does make the correlated sub query more complex because now you are mixing in two tables instead of one.

when CURRENT_DATE is used the SQL is the same as:

SELECT 
    s.emp_sk
    ,ep.pay_rate_sk
    ,TO_DATE('2021-05-31') AS report_date
    ,ep.start_date AS pay_start_date
    ,ep.end_date AS pay_end_date
FROM (
    SELECT 
        e.emp_sk
    FROM employee e
    WHERE EXISTS (
        SELECT 1 
        FROM employee_x_location AS el
        WHERE e.emp_sk = el.emp_sk
            AND TO_DATE('2021-05-31') BETWEEN el.start_date AND COALESCE(el.end_date, CURRENT_DATE)
    )
) AS s
JOIN employee_x_pay_rate AS ep
    ON s.emp_sk = ep.emp_sk
        AND TO_DATE('2021-05-31') BETWEEN ep.start_date AND ep.end_date;

so demonstrating the correlation is complex verse simple can be shown, by swapping employee table with employee_x_pay_rate in the sub-select, like so:

SELECT 
    e.emp_sk
FROM Employee_X_Pay_Rate e
WHERE EXISTS (
    SELECT 1 
    FROM employee_x_location AS el
    WHERE e.emp_sk = el.emp_sk
        AND TO_DATE('2021-05-31') BETWEEN el.start_date AND COALESCE(el.end_date, CURRENT_DATE)
)

works, but use the value from that table does not:

SELECT 
    e.emp_sk
FROM Employee_X_Pay_Rate e
WHERE EXISTS (
    SELECT 1 
    FROM employee_x_location AS el
    WHERE e.emp_sk = el.emp_sk
        AND TO_DATE('2021-05-31') BETWEEN el.start_date AND COALESCE(el.end_date, e.End_Date)
)

sign IFNULL(el.end_date, e.End_Date) and NVL(el.end_date, e.End_Date) both fail also.

But you can restruct the code to move the COALESCE into a CTE, and then use the WHERE EXISTS like so:

WITH r_emp_pay AS (
    SELECT 
       empPay.Emp_SK
       ,empPay.Pay_Rate_SK
       ,empPay.Start_Date 
       ,empPay.End_Date
    FROM Employee_X_Pay_Rate AS empPay
    WHERE TO_DATE('2021-05-31', 'yyyy-mm-dd') BETWEEN empPay.Start_Date AND empPay.End_Date
), r_emp_loc AS (
    SELECT 
        empLoc.Emp_SK
        ,empLoc.Start_Date
        ,empLoc.End_Date
        ,COALESCE(empLoc.End_Date, empPay.End_Date) as col_end_date
    FROM Employee_X_Location empLoc
    JOIN r_emp_pay empPay
        ON empPay.Emp_SK = empLoc.Emp_SK
    WHERE TO_DATE('2021-05-31', 'yyyy-mm-dd') BETWEEN empLoc.Start_Date AND COALESCE(empLoc.End_Date, CURRENT_DATE)
)
SELECT 
   emp.Emp_SK
   ,empPay.Pay_Rate_SK
   ,TO_DATE('2021-05-31', 'yyyy-mm-dd') AS Report_Date
   ,empPay.Start_Date AS Pay_Start_Date
   ,empPay.End_Date AS Pay_End_Date
FROM Employee emp
JOIN r_emp_pay empPay
    ON emp.Emp_SK = empPay.Emp_SK
WHERE EXISTS (
   SELECT 1 FROM r_emp_loc empLoc
   WHERE emp.Emp_SK = empLoc.Emp_SK
      AND TO_DATE('2021-05-31', 'yyyy-mm-dd') BETWEEN empLoc.Start_Date AND empLoc.col_end_date
);

gives:

EMP_SK PAY_RATE_SK REPORT_DATE PAY_START_DATE PAY_END_DATE
1 3 2021-05-31 2021-04-01 00:00:00.000 2099-12-31 00:00:00.000

Upvotes: 1

Related Questions