psrpsrpsr
psrpsrpsr

Reputation: 457

How can I return rows that meet a specific sequence of events?

I am trying to pull records for UserIDs that meet a certain sequence of events. If a user has a JOIN, then a subsequent CANCEL, and then a subsequent JOIN, I want to return them in the result set. I need to run this query for one day at a time, or several days at a time, as needed.

The table below shows examples of UserIDs that meet and do not meet the sequence.

+--------+--------+---------------------+------------+------------------+
| rownum | UserID |      Timestamp      | ActionType | Return in query? |
+--------+--------+---------------------+------------+------------------+
|      1 |  12345 | 2016-11-01 08:25:39 | JOIN       | yes              |
|      2 |  12345 | 2016-11-01 08:27:00 | NULL       | yes              |
|      3 |  12345 | 2016-11-01 08:28:20 | DOWNGRADE  | yes              |
|      4 |  12345 | 2016-11-01 08:31:34 | NULL       | yes              |
|      5 |  12345 | 2016-11-01 08:32:44 | CANCEL     | yes              |
|      6 |  12345 | 2016-11-01 08:45:51 | NULL       | yes              |
|      7 |  12345 | 2016-11-01 08:50:57 | JOIN       | yes              |
|      1 |   9876 | 2016-11-01 16:05:42 | JOIN       | yes              |
|      2 |   9876 | 2016-11-01 16:07:33 | CANCEL     | yes              |
|      3 |   9876 | 2016-11-01 16:09:09 | JOIN       | yes              |
|      1 |  56565 | 2016-11-01 18:15:16 | JOIN       | no               |
|      2 |  56565 | 2016-11-01 19:22:25 | CANCEL     | no               |
|      3 |  56565 | 2016-11-01 20:05:05 | CANCEL     | no               |
|      1 |  34343 | 2016-11-01 05:32:56 | JOIN       | no               |
+--------+--------+---------------------+------------+------------------+

I have read up on gaps and islands, and looked at all sorts of complicated forum posts that dance around what I'm trying to achieve.

Currently, all I'm able to do is look at one day's worth of records, with no constraint on the sequence logic that I need:

SELECT
    ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY tmsmp) rownum
    ,UserID
    ,tmstmp
    ,ActionType
FROM
    t
    INNER JOIN  (
                SELECT UserID
                FROM t
                WHERE tmstmp BETWEEN '2016-11-20 00:00:01' AND '2016-11-20 11:59:59'
                GROUP BY UserID
                HAVING COUNT(*) >= 2
                ) AS sub ON t1.UserID = sub.UserID

Thank you for your input!

Upvotes: 3

Views: 857

Answers (3)

sagi
sagi

Reputation: 40491

You can use LEAD() :

SELECT * FROM (
    SELECT t.* ,
           LAG(t.ActionType,1) OVER(PARTITION BY t.userid ORDER BY t.timestamp) AS LAST_ACTION,
           LAG(t.ActionType,2) OVER(PARTITION BY t.userid ORDER BY t.timestamp) AS LAST_ACTION2,
           LEAD(t.ActionType,1) OVER(PARTITION BY t.userid ORDER BY t.timestamp) AS NEXT_Action,
           LEAD(t.ActionType,2) OVER(PARTITION BY t.userid ORDER BY t.timestamp) AS NEXT_Action2
    FROM YourTable t
    WHERE tmstmp BETWEEN <Start> AND <End>)
WHERE (t.actionType = 'JOIN' AND
      t.NEXT_Action = 'Cancel' AND
      t.NEXT_Action2 = 'JOIN')
  OR (t.LAST_ACTION= 'JOIN' AND
      t.actionType= 'Cancel' AND
      t.NEXT_Action = 'JOIN')
  OR (t.LAST_ACTION2= 'JOIN' AND
      t.LAST_ACTION = 'Cancel' AND
      t.actionType= 'JOIN')

Upvotes: 4

Mark Adelsberger
Mark Adelsberger

Reputation: 45819

In my sample queries I'll do the best I can with the information you've given, but you're a little unclear about what the source table(s) look like. You show one table above (with no name), but then reference two different tables in your sample query... a little hard to see what's going on.

So I'll assume a single table, named t, and you can adjust as needed...

Then how I would handle this, is first identify the users

select distinct userid
  from            t first_join
       inner join t cancel
               on first_join.tmstmp < cancel.tmstp
              and first_join.userid = cancel.userid
       inner join t.second_join
               on second_join.tmstmp > cancel.tmstp
              and second_join.userid = cancel.userid
 where first_join.actiontype = 'JOIN'
   and cancel.actiontype = 'CANCEL'
   and second_join.actiontype = 'JOIN'

So now you can get all records for those users

SELECT *
  FROM T
 WHERE USERID IN (
    select distinct userid
      from            t first_join
           inner join t cancel
                   on first_join.tmstmp < cancel.tmstp
                  and first_join.userid = cancel.userid
           inner join t.second_join
                   on second_join.tmstmp > cancel.tmstp
                  and second_join.userid = cancel.userid
     where first_join.actiontype = 'JOIN'
       and cancel.actiontype = 'CANCEL'
       and second_join.actiontype = 'JOIN'
     )

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270993

Assuming you mean that the records are in order with no gaps, just use lag(), lead() or a combination:

select distinct userId
from (select t.*,
             lag(ActionType) over (partition by userId order by tmstamp) as prev_at,
             lead(ActionType) over (partition by userId order by tmstamp) as next_at,
      from t
     ) t
where ActionType = 'Cancel' and prev_at = 'Join' and next_at = 'Join';

If gaps are allowed, then you can do this differently:

select distint userid
from t
where ActionType = 'Cancel' and
      exists (select 1
              from t t2
              where t2.userId = t.userId and
                    t2.at = 'Join' and
                    t2.tmstamp < t.tmstamp
             ) and
      exists (select 1
              from t t2
              where t2.userId = t.userId and
                    t2.at = 'Join' and
                    t2.tmstamp > t.tmstamp
             );

Upvotes: 1

Related Questions