SQL average time between first and second row in a set

Question

Ouch! My heard hurts .. thought I had it a few times but epic fail :(

I have the following data, millions of rows, indexed, MySQL 5.6.

In this table, there are sets of data, and the uuid is basically a unique id for each set of data.

I need to find the AVERAGE in the data for each 1st and 2nd rows in each set. In other words, how much time past since the set was created with the first insert and the second insert for that same set and then the average of the results.

I can get the average no problem, I just can't seem to get my head around a way to just get the time difference between 1st and 2nd row in each set.

I'm not even going to embarrass myself and paste my broken SQL with my misguided attempts using sub-queries and LIMIT, suffice to say, this one escapes me.

Any help appreciated, beers on me :/

+------+-----------------------------------------+----------------------------+ | id | uuid | stamp | +------+-----------------------------------------+----------------------------+ | 707 | 60b5-d062-5829-c11d-5b71-5d85-075b-a3c5 | 2020-01-01 17:00:28.000000 | | 708 | 60b5-d062-5829-c11d-5b71-5d85-075b-a3c5 | 2020-01-01 17:01:30.000000 | | 709 | 0ccf-94e0-ce72-8092-1975-5bea-6131-c719 | 2020-01-02 14:11:48.000000 | | 710 | 59c8-60ee-d172-511a-a477-c637-6789-f14a | 2020-01-02 14:23:36.000000 | | 711 | b33b-7584-1fed-e138-28ba-c24a-9b46-88e7 | 2020-01-02 14:24:07.000000 | | 712 | eddc-b12a-5ef2-baea-cf53-7287-5805-d922 | 2020-01-02 14:24:26.000000 | | 713 | 257b-fc66-6d7a-ba21-727e-1da7-0ee1-714c | 2020-01-02 14:25:31.000000 | | 718 | c5d9-acba-9a12-aacb-cf45-c5a9-2b8d-314c | 2020-01-02 15:46:41.000000 | | 719 | 0ccf-94e0-ce72-8092-1975-5bea-6131-c719 | 2020-01-02 15:55:42.000000 | | 720 | c5d9-acba-9a12-aacb-cf45-c5a9-2b8d-314c | 2020-01-02 15:56:33.000000 | | 722 | c5d9-acba-9a12-aacb-cf45-c5a9-2b8d-314c | 2020-01-02 16:16:14.000000 | | 723 | c5d9-acba-9a12-aacb-cf45-c5a9-2b8d-314c | 2020-01-02 16:21:25.000000 | | 726 | 6610-a9df-358d-0065-beb8-cea1-82a6-3258 | 2020-01-02 17:16:33.000000 | | 727 | 6610-a9df-358d-0065-beb8-cea1-82a6-3258 | 2020-01-02 17:21:20.000000 | | 728 | 6610-a9df-358d-0065-beb8-cea1-82a6-3258 | 2020-01-02 17:45:07.000000 | | 729 | 6610-a9df-358d-0065-beb8-cea1-82a6-3258 | 2020-01-02 17:50:17.000000 | | 730 | 6610-a9df-358d-0065-beb8-cea1-82a6-3258 | 2020-01-02 18:14:02.000000 | | 731 | 6610-a9df-358d-0065-beb8-cea1-82a6-3258 | 2020-01-02 18:27:48.000000 | | 732 | 6610-a9df-358d-0065-beb8-cea1-82a6-3258 | 2020-01-02 18:28:57.000000 | | 733 | c193-a46f-1104-3ee3-7387-94a8-ef32-a85e | 2020-01-02 18:40:40.000000 | | 734 | c193-a46f-1104-3ee3-7387-94a8-ef32-a85e | 2020-01-02 18:40:49.000000 |

Gordon Linoff · Accepted Answer

If a user id only appears twice, then this is trivial. You have millions of rows, so let's try to avoid a sort and assume you have the right indexes.

Here is one way to get the earliest two rows:

select t.*
from t
where t.stamp <= (select t2.stamp
                  from t t2
                  where t2.uuid = t.uuid
                  order by t2.stamp asc
                  limit 1,1
                 );

Very important: You want an index on (uuid, stamp) for any hope of performance.

Then, just aggregate:

select uuid, timestampdiff(second, min(stamp), max(stamp))
from (select t.*
      from t
      where t.stamp <= (select t2.stamp
                        from t t2
                        where t2.uuid = t.uuid
                        order by t2.stamp asc
                        limit 1,1
                       )
     ) t
group by uuid;

SQL average time between first and second row in a set

Answers (2)

Related Questions