Reputation: 169
I have a dataset with conversation data between a User and a Customer Service. I need to find the response time (time difference) between after a user sent a message and first time the Customer Service responded to the message
What I struggle with is a user can send multiple consecutive message before the Customer Service can reply. If a user sent multiple consecutive message, I need to take the time of the last row, but if the customer service sent multiple consecutive message, I need to take the time of the first row
My expected result is as below:
So, from the 10 message_id in the base table, there will be 3 rows:
I've tried using lead function as follows but the result is not what I expected
select sender,
start_message,
start_reply,
date_diff('second',start_message,start_reply)/60.0 as response_time
from (
select message_id,
created_at as start_message,
lead(created_at) over(partition by sender order by created_at) as start_reply,
sender_type
from base_table
) t
order by 2 asc
I'm using athena, any help is greatly appreciated, thank you
Upvotes: 0
Views: 122
Reputation: 6758
You can shift all the columns by 1 and select all rows where sender is user
and shifted sender is cs
. This will always give you last message of user and first message of support. provided data is ordered
Select message_id, cs_message_id,extract(seconds from cs_created_at - created_at) as response_time from
(
Select *,
lead(sender, 1, sender) over (order by created_at) as cs_sender,
lead(message_id, 1, message_id) over (order by created_at) as cs_message_id,
lead(created_at, 1, created_at) over (order by created_at) as cs_created_at
from tbl_name
)
where cs_sender = 'cs' and sender = 'user';
Output -
message_id cs_message_id response_time
3 4 8.000000
7 8 7.000000
9 10 13.000000
Upvotes: 1
Reputation: 9201
One approach is to use a flag that is set to one whenever the sender is 'user' and the previous sender is 'cs', then use a running sum on that flag to group every two pairs of consecutive (user, cs) as a single group.
with t as
(
select *,
case when
sender ='user' and lag(sender, 1, sender) over (order by created_at) = 'cs'
then 1 else 0
end as flag
from tbl_name
),
grps as
(
select *,
sum(flag) over (order by created_at) as grp
from t
)
select max(case when sender='user' then message_id end) as message_id_user,
min(case when sender='cs' then message_id end) as message_id_cs,
date_diff('second', max(case when sender='user' then created_at end), min(case when sender='cs' then created_at end)) as response_time
from grps
group by grp
order by grp
Upvotes: 1