michalrudko
michalrudko

Reputation: 1530

How to rank event times by a given time window using SQL?

I am struggling to find a way to rank the events using SQL. The goal is to increment the rank whenever an event occurred more than delta seconds (e.g. 1 second) from the previous observation. So far my attempt is shown below:

select a.event_time, a.user_name, a.object_name, a.rnk, case when a.ddif <= 1000 then 0 else 1 end as new_query,
            case when a.ddif <= 1000 then 0 else rnk end as new_rnk
from (
    select *, rank() OVER (PARTITION BY user_name ORDER BY event_time) AS rnk,
              date_diff('second',lag(event_time) OVER (PARTITION BY user_name ORDER BY event_time),event_time) as ddif
    from tmp
    ) a

But it gives me just the following results and I still don't know how to achieve the results in yellow (either of them works for me perfectly).

enter image description here

I'd appreciate any help with that.

Please note: I am using Presto DB, hence I am limited to this query engine.

Upvotes: 2

Views: 506

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1270391

Use lag() and a cumulative sum to define the groups. Then assign row numbers:

select t.*,
       row_number() over (partition by user_name, grp order by event_time) as seqnum
from (select t.*,
             sum(case when prev_et > event_time - interval '1' second
                      then 0 else 1
                 end) over (partition by user_name order by event_time) as grp
      from (select t.*,
                   lag(event_time) over (partition by user_name order by event_time) as prev_et
            from tmp t
           ) t
     ) t;

Upvotes: 1

michalrudko
michalrudko

Reputation: 1530

Thanks for all the good tips that pointed me the direction to the final solution which is:

select a.*, sum (case when a.ddif <= 1 then 0 else 1 end) over (partition by user_name order by event_time) as acc_rnk
    from (
        select *, date_diff('second',lag(event_time) OVER (PARTITION BY user_name ORDER BY event_time),event_time) as ddif
        from tmp
        ) a

Upvotes: 1

GMB
GMB

Reputation: 222582

You can use lag() and a window sum():

select
    t.*,
    sum(case when event_time <= lag_event_time + interval '1' second then 0 else 1 end) rnk 
from (
    select 
        t.*, 
        lag(event_time) over(order by event_time partition by user_name) lag_event_time
    from mytable t
) t

Upvotes: 1

Related Questions