Reputation: 14537
I've seen a lot of questions about this general error, but I don't get why I have it, maybe because of nested window functions...
With the below query, I get the error for Col_C
, Col_D
, ... and almost everything I tried
SQL compilation error: [eachColumn] is not a valid group by expression
SELECT
Col_A,
Col_B,
FIRST_VALUE(Col_C) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
MAX(Col_D) OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
FIRST_VALUE(CASE WHEN Col_T = 'testvalue'
THEN LAST_VALUE(Col_E) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
ELSE NULL END) IGNORE NULLS
OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM mytable
So, is there a way to used nested window functions in Snowflake (with case when ...
) and if so, how/what am I doing wrong ?
Upvotes: 3
Views: 3012
Reputation: 175726
"I've seen a lot of questions about this general error, but I don't get why I have it, maybe because of nested window functions..."
Snowflake supports reusing expressions at the same level(sometimes called "lateral column alias reference" )
It is perfectly fine to write:
SELECT 1+1 AS col1,
col1 *2 AS col2,
CASE WHEN col1 > col2 THEN 'Y' ELSE 'NO' AS col3
...
In standard SQL you will either have to use multiple levels of query(cte) or use LATERAL JOIN. Related: PostgreSQL: using a calculated column in the same query
Unfortunately the same syntax will not work for analytic functions(and I am now aware of any RDMBS that supports it):
SELECT ROW_NUMBER() OVER(PARTITION BY ... ORDER BY ...) AS rn
,MAX(rn) OVER(PARTITION BY <different than prev) AS m
FROM tab;
In the SQL Standard 2016 there is optional feature: T619 Nested window functions.
Here an article how the nested analytic function query could look like: Nested window functions in SQL.
It means that current way to nest windowed function is usage of derived table/cte:
WITH cte AS (
SELECT ROW_NUMBER() OVER(PARTITION BY ... ORDER BY ...) AS rn
,*
FROM tab
)
SELECT *, MAX(rn) OVER(PARTITION BY <different than prev) AS m
FROM cte
Upvotes: 1
Reputation: 25968
So deconstructing your logic to show it's the second FIRST_VALUE that causes the problem
WITH data(Col_A,Col_B,Col_c,col_d, Col_TimeStamp, col_t,col_e) AS (
SELECT * FROM VALUES
(1,1,1,1,1,'testvalue',10),
(1,1,2,3,2,'value',11)
)
SELECT
Col_A,
Col_B,
FIRST_VALUE(Col_C) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as first_c,
MAX(Col_D) OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
LAST_VALUE(Col_E) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as last_e,
IFF(Col_T = 'testvalue', last_e, NULL) as if_test_last_e
/*,FIRST_VALUE(if_test_last_e) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as the_problem*/
FROM data
ORDER BY Col_A,Col_B, col_timestamp
;
if we uncomment the_problem
we have it.. compare to PostgreSQL (my background) just getting to reuse so many prior results/steps is a gift, so here I just bust out another SELECT layer.
WITH data(Col_A,Col_B,Col_c,col_d, Col_TimeStamp, col_t,col_e) AS (
SELECT * FROM VALUES
(1,1,1,1,1,'testvalue',10),
(1,1,2,3,2,'value',11)
)
SELECT *,
FIRST_VALUE(if_test_last_e) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as not_a_problem
FROM (
SELECT
Col_A,
Col_B,
FIRST_VALUE(Col_C) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as first_c,
MAX(Col_D) OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
LAST_VALUE(Col_E) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
ORDER BY Col_TimeStamp DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as last_e,
IFF(Col_T = 'testvalue', last_e, NULL) as if_test_last_e
,Col_TimeStamp
FROM data
)
ORDER BY Col_A,Col_B, Col_TimeStamp
And then it all works. This also happens if you LAG then IFF/FIRST_VALUE and then LAG that second result.
Upvotes: 2