mls.z
mls.z

Reputation: 53

Pairing Sequential Events on Postgresql

We are logging the main flows of actions our users make on our iPad app on a table. Each flow has a start(tagged Started) and an end that is either tagged Cancelled or Finished, and there shouldn't be any overlapping events.

A set of flows Started, Cancelled or Finished for a user looks like this:

user_id             timestamp                   event_text      event_num
[email protected]   2016-10-30 00:08:00.966+00  Flow Started    0
[email protected]   2016-10-30 00:08:15.58+00   Flow Cancelled  2
[email protected]   2016-10-30 00:08:15.581+00  Flow Started    0
[email protected]   2016-10-30 00:34:44.134+00  Flow Finished   1
[email protected]   2016-10-30 00:42:26.102+00  Flow Started    0
[email protected]   2016-10-30 00:42:49.276+00  Flow Cancelled  2
[email protected]   2016-10-30 00:42:49.277+00  Flow Started    0
[email protected]   2016-10-30 00:59:47.337+00  Flow Cancelled  2
[email protected]   2016-10-30 00:59:47.337+00  Flow Started    0
[email protected]   2016-10-30 00:59:47.928+00  Flow Cancelled  2

We want to calculate how long a cancelled and finished flow last on average. For this we need to pair event Started with Canceled or Finished. The following code does that, however can't work around the following data quality issue that we have:

We get this result:

user_id             start                       end                         action
[email protected]   2016-10-30 00:08:00.966+00  2016-10-30 00:08:15.58+00   2
[email protected]   2016-10-30 00:08:15.581+00  2016-10-30 00:34:44.134+00  1
[email protected]   2016-10-30 00:42:26.102+00  2016-10-30 00:42:49.276+00  2
[email protected]   2016-10-30 00:42:49.277+00  NULL                        NULL
[email protected]   2016-10-30 00:59:47.337+00  2016-10-30 00:59:47.337+00  2
[email protected]   NULL                        2016-10-30 00:59:47.928+00  2

But we should get this:

user_id             start                       end                         action
[email protected]   2016-10-30 00:08:00.966+00  2016-10-30 00:08:15.58+00   2
[email protected]   2016-10-30 00:08:15.581+00  2016-10-30 00:34:44.134+00  1
[email protected]   2016-10-30 00:42:26.102+00  2016-10-30 00:42:49.276+00  2
[email protected]   2016-10-30 00:42:49.277+00  2016-10-30 00:59:47.337+00  2
[email protected]   2016-10-30 00:59:47.337+00  2016-10-30 00:59:47.928+00  2

How do I need to alter the code so that the pairing is correct?

Upvotes: 5

Views: 144

Answers (1)

David דודו Markovitz
David דודו Markovitz

Reputation: 44941

select      user_id       
           ,"start"                       
           ,"end"                         
           ,"action"

from       (select      user_id
                       ,timestamp                 as "start"
                       ,lead (event_num)   over w as "action"
                       ,lead ("timestamp") over w as "end"
                       ,event_num

            from        tracks t

            window      w as (partition by user_id order by "timestamp",event_num desc)
            ) t

where       t.event_num = 0
;

Upvotes: 2

Related Questions