Reputation: 253
I have data in a Redshift table like product_id, price, and time_of_purchase. I want to create partitions for every time the price changed since the previous purchase. In this case the price of an item may go back to a previous price, but I need this to be a separate partition, e.g.:
Note the price was $2, then went up to $3, then went back to $2. If I do something like (partition by product_id, price order by time_of_purchase) then the last row gets partitioned with the top two, which I don't want. How can I do this correctly so I get three separate partitions?
Upvotes: 1
Views: 97
Reputation: 6721
As opposed to @Gordon Linoff, I prefer to do it step by step, using WITH clauses ...
And, as I stated several times in other posts - please add your exemplary data in a copy-paste ready format, so we don't have to copy-paste your examples.
I like to add my examples in a self-contained micro demo format, with the input data already in my post, so everyone can play with it, that's why ..
WITH
-- your input, typed manually ....
indata(product_id,price,tm_of_p) AS (
SELECT 1,2.00,TIMESTAMP '2020-09-14 09:00'
UNION ALL SELECT 1,2.00,TIMESTAMP '2020-09-14 10:00'
UNION ALL SELECT 1,3.00,TIMESTAMP '2020-09-14 11:00'
UNION ALL SELECT 1,3.00,TIMESTAMP '2020-09-14 12:00'
UNION ALL SELECT 1,2.00,TIMESTAMP '2020-09-14 13:00'
)
,
with_change_counter AS (
SELECT
*
, CASE WHEN LAG(price) OVER(PARTITION BY product_id ORDER BY tm_of_p) <> price
THEN 1
ELSE 0
END AS chg_count
FROM indata
)
SELECT
product_id
, price
, tm_of_p
, SUM(chg_count) OVER(PARTITION BY product_id ORDER BY tm_of_p) AS session_id
FROM with_change_counter;
-- out product_id | price | tm_of_p | session_id
-- out ------------+-------+---------------------+------------
-- out 1 | 2.00 | 2020-09-14 09:00:00 | 0
-- out 1 | 2.00 | 2020-09-14 10:00:00 | 0
-- out 1 | 3.00 | 2020-09-14 11:00:00 | 1
-- out 1 | 3.00 | 2020-09-14 12:00:00 | 1
-- out 1 | 2.00 | 2020-09-14 13:00:00 | 2
Upvotes: 0
Reputation: 1269503
Use lag()
to get the previous value and then a cumulative sum:
select t.*,
sum(case when prev_price = price then 0 else 1 end) over
(partition by product_id order by time_of_purchase) as partition_id
from (select t.*,
lag(price) over (partition by product_id order by time_of_purchase) as prev_price
from t
) t
Upvotes: 3