SQLnoob
SQLnoob

Reputation: 253

Redshift separate partitions with identical data by time

I have data in a Redshift table like product_id, price, and time_of_purchase. I want to create partitions for every time the price changed since the previous purchase. In this case the price of an item may go back to a previous price, but I need this to be a separate partition, e.g.:

enter image description here

Note the price was $2, then went up to $3, then went back to $2. If I do something like (partition by product_id, price order by time_of_purchase) then the last row gets partitioned with the top two, which I don't want. How can I do this correctly so I get three separate partitions?

Upvotes: 1

Views: 97

Answers (2)

marcothesane
marcothesane

Reputation: 6721

As opposed to @Gordon Linoff, I prefer to do it step by step, using WITH clauses ...

And, as I stated several times in other posts - please add your exemplary data in a copy-paste ready format, so we don't have to copy-paste your examples.

I like to add my examples in a self-contained micro demo format, with the input data already in my post, so everyone can play with it, that's why ..

WITH
-- your input, typed manually ....
indata(product_id,price,tm_of_p) AS (
          SELECT 1,2.00,TIMESTAMP '2020-09-14 09:00'
UNION ALL SELECT 1,2.00,TIMESTAMP '2020-09-14 10:00'
UNION ALL SELECT 1,3.00,TIMESTAMP '2020-09-14 11:00'
UNION ALL SELECT 1,3.00,TIMESTAMP '2020-09-14 12:00'
UNION ALL SELECT 1,2.00,TIMESTAMP '2020-09-14 13:00'
)
,
with_change_counter AS (
  SELECT
    *
  , CASE WHEN LAG(price) OVER(PARTITION BY product_id ORDER BY tm_of_p) <> price
     THEN 1
     ELSE 0
    END AS chg_count
  FROM indata
)
SELECT
  product_id
, price
, tm_of_p
, SUM(chg_count) OVER(PARTITION BY product_id ORDER BY tm_of_p) AS session_id
FROM with_change_counter;
-- out  product_id | price |       tm_of_p       | session_id 
-- out ------------+-------+---------------------+------------
-- out           1 |  2.00 | 2020-09-14 09:00:00 |          0
-- out           1 |  2.00 | 2020-09-14 10:00:00 |          0
-- out           1 |  3.00 | 2020-09-14 11:00:00 |          1
-- out           1 |  3.00 | 2020-09-14 12:00:00 |          1
-- out           1 |  2.00 | 2020-09-14 13:00:00 |          2

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269503

Use lag() to get the previous value and then a cumulative sum:

select t.*,
       sum(case when prev_price = price then 0 else 1 end) over 
           (partition by product_id order by time_of_purchase) as partition_id
from (select t.*,
             lag(price) over (partition by product_id order by time_of_purchase) as prev_price
      from t
     ) t

Upvotes: 3

Related Questions