Find time difference between non-consecutive rows depending on column values in Athena

Question

I have a dataset with conversation data between a User and a Customer Service. I need to find the response time (time difference) between after a user sent a message and first time the Customer Service responded to the message

What I struggle with is a user can send multiple consecutive message before the Customer Service can reply. If a user sent multiple consecutive message, I need to take the time of the last row, but if the customer service sent multiple consecutive message, I need to take the time of the first row

My expected result is as below:

So, from the 10 message_id in the base table, there will be 3 rows:

time difference between message_id 3 and 4
time difference between message_id 7 and 8
time difference between message_id 9 and 10

I've tried using lead function as follows but the result is not what I expected

 select sender,
        start_message,
        start_reply,
        date_diff('second',start_message,start_reply)/60.0 as response_time
from (
        select message_id,
                created_at as start_message,
                  lead(created_at) over(partition by sender  order by created_at) as start_reply,
                  sender_type
          from base_table
     ) t 
 order by 2 asc

I'm using athena, any help is greatly appreciated, thank you

ahmed · Accepted Answer

One approach is to use a flag that is set to one whenever the sender is 'user' and the previous sender is 'cs', then use a running sum on that flag to group every two pairs of consecutive (user, cs) as a single group.

with t as
(
  select *,
    case when 
      sender ='user' and lag(sender, 1, sender) over (order by created_at) = 'cs' 
      then 1 else 0 
    end as flag
 from tbl_name
),
grps as
(
  select *, 
    sum(flag) over (order by created_at) as grp
  from t
)
select max(case when sender='user' then message_id end) as message_id_user,
       min(case when sender='cs' then message_id end) as message_id_cs, 
       date_diff('second', max(case when sender='user' then created_at end), min(case when sender='cs' then created_at end)) as response_time
from grps
group by grp
order by grp

Find time difference between non-consecutive rows depending on column values in Athena

Answers (2)

Related Questions