Beginn_noob
Beginn_noob

Reputation: 41

Find average time between the first and second listening by users

Let's say I have the following table of user listenings for a music platform:

id user_id started_at finished_at
1 101 '2017-10-05 08:38:22' '2017-10-05 13:59:03'
2 101 '2017-10-05 15:15:30' '2017-10-05 15:15:41'
3 101 '2017-10-05 15:15:46' '2017-10-05 15:46:46'
4 102 '2017-10-12 13:45:27' '2017-10-12 15:14:49'
5 103 '2017-10-10 12:21:19' '2017-10-10 12:42:27'
6 103 '2017-10-10 12:45:56' '2017-10-10 12:52:22'
7 103 '2017-10-10 14:25:54' '2017-10-10 16:32:57'
8 103 '2017-10-10 16:35:01' '2017-10-10 18:09:50'

where id (int) is record of user that listened to music (or book or any other material), user_id (int) is a listener id, started_at (timestamp) is time when user started listening and finished_at is when user finished listening

What I need to do is find average time between the first and second listening by user. For example, for a first user with user_id = 101, it will be:

'2017-10-05 15:15:30' (started_at column, second row) - '2017-10-05 13:59:03' (finished_at, first row)

Which gives 1 hour and 15 minutes time difference.

For that case I wrote the following code:

   SELECT  user_id, (TIMESTAMPDIFF(SECOND, pDataDate, started_at)/3600)
   FROM    (
        SELECT  *,
                LAG(finished_at) OVER (ORDER BY finished_at) pDataDate
        FROM    listenings
    
        ) q
   WHERE   pDataDate IS NOT NULL

My problem is in handling the cases in which there is only one user_id in the table (user_id = 102 in this example). It has only one row which means started_at and finished_at at the same column. I don't how to properly write a statement that combines both of the cases. Can someone suggest the query (may be with if/case statements) that covers both of the cases?

Also, I want to find a way to only take the average of the first two top rows of the group. Say, for user_id = 101, I will only take the average between rows 1 and 2.

Thanks in advance and sorry if I couldn't write it clearer. I will probably edit question for convenient reading

Upvotes: 0

Views: 104

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270391

Instead of LAG(), use ROW_NUMBER(). You can then use aggreagtion:

SELECT user_id,
        TIMESTAMPDIFF(SECOND, MAX(finished_at), NULLIF(MIN(finished_at), MAX(finished_at))) / 3600
FROM (SELECT l.*,
             ROW_NUMBER() OVER (PARTITION BY user_id ORJDER BY finished_at) as seqnum
      FROM listenings l    
     ) l
WHERE seqnum <= 2
GROUP BY user_id;

This returns NULL if a user has only one row.

Upvotes: 2

Related Questions