Reputation: 15
I have a data set that looks like this (ordered by date):
date | value | first_id | second_id |
---|---|---|---|
2020-01-01 | 10 | 1 | 1 |
2020-01-02 | 15 | 1 | 1 |
2020-01-03 | 5 | 1 | 2 |
2020-01-04 | 75 | 2 | 2 |
2020-01-05 | 101 | 2 | 2 |
2020-01-06 | 12 | 1 | 1 |
2020-01-07 | 5 | 1 | 1 |
2020-01-08 | 14 | 1 | 2 |
I need to get an aggregation when values are the same for the same first_id and second_id in a sequence, lets say max(value), so I can get:
max_value | first_id | second_id |
---|---|---|
15 | 1 | 1 |
5 | 1 | 2 |
101 | 2 | 2 |
12 | 1 | 1 |
14 | 1 | 2 |
If you do max(value) and group by, same first_id and second_id combinations will give just one row (regardless of date ordering). I was thinking to add RANK when one of ids changes, e.g:
date | value | first_id | second_id | rank |
---|---|---|---|---|
2020-01-01 | 10 | 1 | 1 | 1 |
2020-01-02 | 15 | 1 | 1 | 1 |
2020-01-03 | 5 | 1 | 2 | 2 |
2020-01-04 | 75 | 2 | 2 | 3 |
2020-01-05 | 101 | 2 | 2 | 3 |
2020-01-06 | 12 | 1 | 1 | 4 |
2020-01-07 | 5 | 1 | 1 | 4 |
2020-01-08 | 14 | 1 | 2 | 5 |
But I don't know how to get that rank as well since same id combinations are considered together.
Upvotes: 1
Views: 61
Reputation: 1271003
You can use lag()
and a cumulative sum to define the groups and then aggregate. You can see the groups if you run this query:
select t.*,
sum(case when prev_date = prev_date2 then 0 else 1 end) over (order by date) as grp
from (select t.*,
lag(date) over (order by date) as prev_date,
lag(date) over (partition by first_id, second_id order by date) as prev_date2
from t
) t;
The logic is saying that a new group starts when the previous date does not have the same values of the two id columns.
Then the aggregation is:
with grps as (
select t.*,
sum(case when prev_date = prev_date2 then 0 else 1 end) over (order by date) as grp
from (select t.*,
lag(date) over (order by date) as prev_date,
lag(date) over (partition by first_id, second_id order by date) as prev_date2
from t
) t
)
select first_id, second_id, max(value), min(date), max(date)
from grps
group by grp
Upvotes: 1