How to Update Subsets of a Window (Partition)

Question

My data can be partitioned into an ordered set of events in time for each item of interest. Looks like this:

item_id    state    time  window
---------- -------- ----- --------
a1         start    t1    w1
a1         stall    t2    w1
a1         restart  t3    w1
a1         stall    t4    w1
a1         restart  t5    w1
a1         stop     t6    w1

a2         start    t9    w2
a2         stop     t10   w2
a2         start    t11   w2
a2         stop     t12   w2

In this example, the times are arranged in chronological order (t(n) is later than t(n-1)). The 'window' column is illustrative only of the partition definition below, and isn't actually in the data. I only know how to partition this as follows:

window w over (partition by item_id order by time)

This window will always be bounded by the earliest 'start' and the latest state in the data, but there may many 'stop' states that occur before the last recorded state. What I want to do now is subdivide this window into intervals that begin with the 'start' time and end with the next 'stop' time, if there is one, or the last state following a start if not.

My approach so far: pull out all the 'start' records, order them by time, and generate a sequence number for them. That sequence number (I'm actually just loading those start records into a temp table that has a serial primary key on it) is then unique for every interval I want to identify.

So, I now have data that looks like this:

item_id    state    time  interval_id
---------- -------- ----- ------------
a1         start    t1    1
a1         stall    t2    NULL
a1         restart  t3    NULL
a1         stall    t4    NULL
a1         restart  t5    NULL
a1         stop     t6    NULL

a2         start    t9    2
a2         stop     t10   NULL
a2         start    t11   3
a2         stop     t12   NULL

In other words, sure, I can identify the first member of each of my 'sub-window' intervals, but I still can't find a reasonable way to tag the other members of these sets with the same interval id. I'm getting wrapped around the axle trying define the other members of each set--the original problem.

The whole reason for tagging the sets with an identifier is really so I can use that identifier for windowing the data based on the real window definition: "starts with 'start', ends with 'end', is ordered by time".

If the problem is clear, how to approach?

ahmed · Accepted Answer

If the 'stop' state is always followed by the 'start' state, you could use a running sum that increased by 1 whenever a 'start' state is found:

select *,
  sum(1) filter (where state='start') over (order by replace(time, 't','')::int) interval_id
from tbl
order by replace(time, 't','')::int

replace(time, 't','')::int is used to get the correct order of time, if you are using real timestamps just use order by time.

See demo

How to Update Subsets of a Window (Partition)

Answers (1)

Related Questions