Reputation: 239
I have data with id, timestamp(ts) event, capital_event_bool, and prev_event_capital_bool.
id ts event capital_event_bool prev_event_capital_bool
001 00:01 a 0 0
002 00:02 b 0 0
002 00:03 b 0 0
002 00:04 b 1 0
002 00:05 c 0 1
003 00:03 c 0 0
003 00:04 b 0 0
003 00:05 b 1 0
003 00:06 b 0 1
003 00:07 b 0 0
003 00:08 b 1 0
Only "b" events can have a capital_event_bool = True. What I would like to accomplish is have a way to count all capital_event_bool = False b events prior to every capital_event_bool = True event for every id. I originally thought I could accomplish this via the row_number() window function in Redshift with
ROW_NUMBER() OVER (PARTITION BY id, event, capital_event_bool ORDER BY ts) AS row_num
but the part that is tripping me up is how to get the count to restart after every capital_event_bool = True event. It is fine if the row numbering will stop at every capital_event_bool = True event and then restart because I can just use a case statement with the capital_event_bool to reach my final result.
row_num DESIRED only row_num Final Desired Result
1 1 0
1 1 0
2 2 0
1 3 2
1 1 0
2 2 0
1 1 0
1 2 1
2 1 0
3 2 0
2 3 2
Upvotes: 0
Views: 1135
Reputation: 1269503
This is a type of gap-and-islands problem. Basically, you need to define subsets of the data by the number of "1" in the "b" columns. For this purpose, an inverse sum of capital_event_bool
does exactly what you want. Then, you can use window functions on this group:
select t.*,
(case when capital_event_bool = 1
then sum( (event = 'b')::int ) over (partition by id, grp) - 1
else 0
end) as final_result
from (select t.*,
sum(capital_event_bool) over (partition by id order by ts desc) as grp
from t
) t
Upvotes: 1