Reputation: 41
Let's say I have the following table of user listenings for a music platform:
id | user_id | started_at | finished_at |
---|---|---|---|
1 | 101 | '2017-10-05 08:38:22' | '2017-10-05 13:59:03' |
2 | 101 | '2017-10-05 15:15:30' | '2017-10-05 15:15:41' |
3 | 101 | '2017-10-05 15:15:46' | '2017-10-05 15:46:46' |
4 | 102 | '2017-10-12 13:45:27' | '2017-10-12 15:14:49' |
5 | 103 | '2017-10-10 12:21:19' | '2017-10-10 12:42:27' |
6 | 103 | '2017-10-10 12:45:56' | '2017-10-10 12:52:22' |
7 | 103 | '2017-10-10 14:25:54' | '2017-10-10 16:32:57' |
8 | 103 | '2017-10-10 16:35:01' | '2017-10-10 18:09:50' |
where id (int) is record of user that listened to music (or book or any other material), user_id (int) is a listener id, started_at (timestamp) is time when user started listening and finished_at is when user finished listening
What I need to do is find average time between the first and second listening by user. For example, for a first user with user_id = 101, it will be:
'2017-10-05 15:15:30' (started_at column, second row) - '2017-10-05 13:59:03' (finished_at, first row)
Which gives 1 hour and 15 minutes time difference.
For that case I wrote the following code:
SELECT user_id, (TIMESTAMPDIFF(SECOND, pDataDate, started_at)/3600)
FROM (
SELECT *,
LAG(finished_at) OVER (ORDER BY finished_at) pDataDate
FROM listenings
) q
WHERE pDataDate IS NOT NULL
My problem is in handling the cases in which there is only one user_id in the table (user_id = 102 in this example). It has only one row which means started_at and finished_at at the same column. I don't how to properly write a statement that combines both of the cases. Can someone suggest the query (may be with if/case statements) that covers both of the cases?
Also, I want to find a way to only take the average of the first two top rows of the group. Say, for user_id = 101, I will only take the average between rows 1 and 2.
Thanks in advance and sorry if I couldn't write it clearer. I will probably edit question for convenient reading
Upvotes: 0
Views: 104
Reputation: 1270391
Instead of LAG()
, use ROW_NUMBER()
. You can then use aggreagtion:
SELECT user_id,
TIMESTAMPDIFF(SECOND, MAX(finished_at), NULLIF(MIN(finished_at), MAX(finished_at))) / 3600
FROM (SELECT l.*,
ROW_NUMBER() OVER (PARTITION BY user_id ORJDER BY finished_at) as seqnum
FROM listenings l
) l
WHERE seqnum <= 2
GROUP BY user_id;
This returns NULL
if a user has only one row.
Upvotes: 2