user11885406
user11885406

Reputation:

Snowflake SQL Range between interval

I have input table with following structure - ID,Date, Value.

I am trying to calculate minimum value in last 10 months for every record in dataset. For that I am using range between interval.

The code below is working fine in SPARK SQL but for some reason I can't use the same code in snowflake SQL. Appreciate if someone can guide me on how to modify the below code to run in Snowflake SQL.

select *,
min(avg_Value) OVER (
        PARTITION BY ID 
        ORDER BY CAST(Date AS timestamp)  
        RANGE BETWEEN INTERVAL 10 MONTHS PRECEDING AND CURRENT ROW) as min_value_in_last_10_months
from        
(
select  ID,
        Date,
        avg(Value) as avg_Value
from table
group by ID,Date
)

Upvotes: 2

Views: 6459

Answers (3)

Felipe Hoffa
Felipe Hoffa

Reputation: 59165

Good news, now Snowflake supports range - so the sample question can be run as this:

with data as(
 select o_orderkey id, o_orderdate date, o_totalprice value from snowflake_sample_data.tpch_sf1.orders limit 1000
)

select *,
min(avg_Value) OVER (
        PARTITION BY ID 
        ORDER BY CAST(Date AS timestamp)  
        RANGE BETWEEN INTERVAL '10 MONTHS' PRECEDING AND CURRENT ROW) as min_value_in_last_10_months
from        
(
select  ID,
        Date,
        avg(Value) as avg_Value
from data
group by ID,Date
)

Note I had to put the interval '10 months' between quotes to fix the query.

Upvotes: 1

user25416719
user25416719

Reputation: 1

RANGE BETWEEN is still not supported by Snowflake for sliding window functions, but I have posted a workaround that retains a similar structure using ROWS BETWEEN.

I'll start by outlining the problem. For sliding windows, Snowflake does not support RANGE BETWEEN. However, it does support ROWS BETWEEN. As long as these rows are systematic, then we have no issue utilizing the ROWS BETWEEN keyword.

If your data is taken on a monthly frequency, then you simply need to specify as follows.

select id, dt, 
       min(val) over(partition by id order by id, date rows between 9 and current row) as ten_mth_min
from inp_table

If your data is collected in a less systematic fashion, then you will first need to aggregate your data by month, then select values within a specified window by using a qualify statement.

with agg_inp_table as (
   select id, last_day(dt) as mthly_dt, min(val) as mthly_min_val
   from inp_table
   group by id, mthly_dt
   order by id, mthly_dt
)

select id, mthly_dt, 
       min(mthly_min_val) over(partition by id order by id, mthly_dt rows between 9 preceding and current row) as ten_mth_min
from agg_inp_table
qualify min(mthly_dt) over(partition by id order by id, mthly_dt rows between 9 preceding and current row) >= dateadd(months, -9, mthly_dt) and
        max(dt) over(partition by id order by id, mthly_dt rows between 9 preceding and current row) <= mthly_dt
order by id, mthly_dt

This process can apply to many different aggregations (min, max, avg, sum). Make sure that you double check the data though, if memory serves me right it will omit an output if the window isn't complete (missing dates and such).

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269443

Snowflake supports lateral joins, so one method is:

select . . .
from t cross join lateral
     (select avg(t2.value) as avg_value
      from t t2
      where t2.id = t.id and
            t2.date >= t.date - interval 10 month and
            t2.date <= t.date
     ) a

Upvotes: 3

Related Questions