Reputation: 517
My data can be partitioned into an ordered set of events in time for each item of interest. Looks like this:
item_id state time window
---------- -------- ----- --------
a1 start t1 w1
a1 stall t2 w1
a1 restart t3 w1
a1 stall t4 w1
a1 restart t5 w1
a1 stop t6 w1
a2 start t9 w2
a2 stop t10 w2
a2 start t11 w2
a2 stop t12 w2
In this example, the times are arranged in chronological order (t(n) is later than t(n-1)). The 'window' column is illustrative only of the partition definition below, and isn't actually in the data. I only know how to partition this as follows:
window w over (partition by item_id order by time)
This window will always be bounded by the earliest 'start' and the latest state in the data, but there may many 'stop' states that occur before the last recorded state. What I want to do now is subdivide this window into intervals that begin with the 'start' time and end with the next 'stop' time, if there is one, or the last state following a start if not.
My approach so far: pull out all the 'start' records, order them by time, and generate a sequence number for them. That sequence number (I'm actually just loading those start records into a temp table that has a serial primary key on it) is then unique for every interval I want to identify.
So, I now have data that looks like this:
item_id state time interval_id
---------- -------- ----- ------------
a1 start t1 1
a1 stall t2 NULL
a1 restart t3 NULL
a1 stall t4 NULL
a1 restart t5 NULL
a1 stop t6 NULL
a2 start t9 2
a2 stop t10 NULL
a2 start t11 3
a2 stop t12 NULL
In other words, sure, I can identify the first member of each of my 'sub-window' intervals, but I still can't find a reasonable way to tag the other members of these sets with the same interval id. I'm getting wrapped around the axle trying define the other members of each set--the original problem.
The whole reason for tagging the sets with an identifier is really so I can use that identifier for windowing the data based on the real window definition: "starts with 'start', ends with 'end', is ordered by time".
If the problem is clear, how to approach?
Upvotes: 1
Views: 45
Reputation: 9191
If the 'stop' state is always followed by the 'start' state, you could use a running sum that increased by 1 whenever a 'start' state is found:
select *,
sum(1) filter (where state='start') over (order by replace(time, 't','')::int) interval_id
from tbl
order by replace(time, 't','')::int
replace(time, 't','')::int
is used to get the correct order of time, if you are using real timestamps just use order by time
.
Upvotes: 1