user1881820
user1881820

Reputation: 157

How to do a rolling sum, each row need to include the sum of previous rows

I have table [visit]. I need to get the rows grouped by user_id with sum of visit_duration_seconds for when order_number is null, for example, for user [2875636] i'll get: 61+151+33+13. Each row should include the sum of the rows before it.
Plz also refer RESULT column in expected result below

user_id   starttime           visit_duration_seconds  order_number
2875636   2013-01-16 18:03:50 61  
2875636   2013-01-16 18:08:18 151 
2875636   2013-01-16 18:15:43 33  
2875636   2013-01-16 18:16:37 13  
2875636   2013-01-16 18:18:01 2011                     10177888
2875636   2013-01-16 18:24:35 1172                     10177884
2875636   2013-01-16 18:32:03 4731    
2875636   2013-01-16 18:33:27 407 
2875636   2013-01-16 18:37:29 74  
2875636   2013-01-16 18:48:55 80  
2875636   2013-01-16 19:05:00 1955    
2875636   2013-01-16 19:14:12 326 
2875636   2013-01-16 19:23:39 972 
2875636   2013-01-16 19:33:05 5440    
2875636   2013-01-16 19:35:48 43  
2875636   2013-01-16 19:41:10 66  
2875636   2013-01-16 19:42:03 100 
2875636   2013-01-16 19:42:12 2414                     10177940
2875636   2013-01-16 19:49:05 432                  10177925
2875636   2013-01-16 19:50:19 183 
2875636   2013-01-16 19:52:46 2061    
2875636   2013-01-16 19:52:53 400 
2875636   2013-01-16 20:00:47 338 
2875636   2013-01-16 20:08:58 216 
2875636   2013-01-16 20:14:21 58  
2875636   2013-01-16 20:14:26 196 
2875636   2013-01-16 20:19:14 2189    
2875636   2013-01-16 20:21:29 424 
2875636   2013-01-16 20:24:42 999 
2875636   2013-01-16 21:01:39 1810    
2875636   2013-01-16 21:02:54 525 
2875636   2013-01-16 21:10:06 27  
2875636   2013-01-16 21:12:08 282 
2875636   2013-01-16 21:51:02 6   
2875636   2013-01-16 22:18:34 173 
2875636   2013-01-16 23:02:58 318 
2875636   2013-01-16 23:45:37 207 
3018868   2013-01-16 16:01:45 18  
3018868   2013-01-16 16:16:45 39  
3018868   2013-01-16 16:22:55 656 
3018868   2013-01-16 16:25:54 1852    
3018868   2013-01-16 16:29:23 688 
3018868   2013-01-16 16:47:26 2258                       10177846
3018868   2013-01-16 16:57:41 572 
3018868   2013-01-16 17:06:47 1431    
3018868   2013-01-16 17:18:32 29  
3018868   2013-01-16 17:21:57 45  
3018868   2013-01-16 17:29:23 16  
3018868   2013-01-16 17:36:47 490

EXPECTED RESULT

user_id starttime           visit_duration_seconds  order_number        RESULT
2875636 2013-01-16 18:03:50 61                                      61
2875636 2013-01-16 18:08:18 151                                     212
2875636 2013-01-16 18:15:43 33                                      245
2875636 2013-01-16 18:16:37 13                                      258
2875636 2013-01-16 18:18:01 2011                     10177888           0
2875636 2013-01-16 18:24:35 1172                     10177884           0
2875636 2013-01-16 18:32:03 4731                                        4731
2875636 2013-01-16 18:33:27 407                                     5138
2875636 2013-01-16 18:37:29 74                                      5212
2875636 2013-01-16 18:48:55 80                                      ...
2875636 2013-01-16 19:05:00 1955                                        ...
2875636 2013-01-16 19:14:12 326                                     ...
2875636 2013-01-16 19:23:39 972 
2875636 2013-01-16 19:33:05 5440    
2875636 2013-01-16 19:35:48 43  
2875636 2013-01-16 19:41:10 66  
2875636 2013-01-16 19:42:03 100 
2875636 2013-01-16 19:42:12 2414                     10177940
2875636 2013-01-16 19:49:05 432                  10177925
2875636 2013-01-16 19:50:19 183 
2875636 2013-01-16 19:52:46 2061    
2875636 2013-01-16 19:52:53 400 
2875636 2013-01-16 20:00:47 338 
2875636 2013-01-16 20:08:58 216 
2875636 2013-01-16 20:14:21 58  
2875636 2013-01-16 20:14:26 196 
2875636 2013-01-16 20:19:14 2189    
2875636 2013-01-16 20:21:29 424 
2875636 2013-01-16 20:24:42 999 
2875636 2013-01-16 21:01:39 1810    
2875636 2013-01-16 21:02:54 525 
2875636 2013-01-16 21:10:06 27  
2875636 2013-01-16 21:12:08 282 
2875636 2013-01-16 21:51:02 6   
2875636 2013-01-16 22:18:34 173 
2875636 2013-01-16 23:02:58 318 
2875636 2013-01-16 23:45:37 207 
3018868 2013-01-16 16:01:45 18  
3018868 2013-01-16 16:16:45 39  
3018868 2013-01-16 16:22:55 656 
3018868 2013-01-16 16:25:54 1852    
3018868 2013-01-16 16:29:23 688 
3018868 2013-01-16 16:47:26 2258                       10177846
3018868 2013-01-16 16:57:41 572 
3018868 2013-01-16 17:06:47 1431    
3018868 2013-01-16 17:18:32 29  
3018868 2013-01-16 17:21:57 45  
3018868 2013-01-16 17:29:23 16  
3018868 2013-01-16 17:36:47 490 

Upvotes: 3

Views: 1379

Answers (1)

spencer7593
spencer7593

Reputation: 108370

You can make use of MySQL user variables to emulate analytic functions. (There are some other approaches as well, like using a semi-join or using a correlated subquery. I can provide solutions for those as well, if you feel they may be more appropriate.)

For emulating a "running total" analytic function, try something like this:

SELECT t.user_id
     , t.starttime
     , t.order_number
     , IF(t.order_number IS NOT NULL,
         @tot_dur := 0,
         @tot_dur := @tot_dur + t.visit_duration_seconds) AS tot_dur
  FROM visit t
  JOIN (SELECT @tot_dur := 0) d
 ORDER BY t.user_id, t.start_time

The "trick" here is to use an IF function to test whether or not order_number is null. When it's null, we add the duration value to the variable, otherwise, we set the variable to zero.

We use an inline view (aliased as d, to ensure that the @tot_dur variable is initialized to zero.

NOTE: Take care with using MySQL user variables like this. In the SELECT statement as above, the assignment of the variables in the SELECT list happens after the ORDER BY, so we can get deterministic behavior.


That query does not handle "breaks" in user_id. To get that, we are going to need the value of user_id from the previous row. We can preserve that in another user variable. The order of the operations is deterministic, and we need to take care to do the accumulation BEFORE we overwrite the user_id from the previous row.

We either need to reorder the columns so that user_id appears after tot_dur (or include a second copy of the user_id column)

SELECT t.user_id
     , t.starttime
     , t.order_number
     , IF(t.order_number IS NULL,
         @tot_dur := IF(@prev_user_id = t.user_id,@tot_dur,0) + t.visit_duration_seconds,
         @tot_dur := 0
       ) AS tot_dur
     , @prev_user_id := t.user_id AS prev_user_id
  FROM visit t
  JOIN (SELECT @tot_dur := 0, @prev_user_id := NULL) d
 ORDER BY t.user_id, t.start_time

The values returned in the user_id and prev_user_id columns is identical. That "extra" column could be removed, or the columns could be reordered by wrapping the query (as an inline view) in another query, although this comes at a performance cost:

SELECT v.user_id
     , v.starttime
     , v.order_number
     , v.tot_dur
  FROM (SELECT t.starttime
             , t.order_number
             , IF(t.order_number IS NULL,
                 @tot_dur := IF(@prev_user_id = t.user_id,@tot_dur,0) + t.visit_duration_seconds,
                 @tot_dur := 0
               ) AS tot_dur
             , @prev_user_id := t.user_id AS user_id
          FROM visit t
          JOIN (SELECT @tot_dur := 0, @prev_user_id := NULL) d
         ORDER BY t.user_id, t.start_time
       ) v

That query demonstrates that it is possible for MySQL to return the specified resultset. But for optimum performance, we'd want to run just the query in the inline view (aliased as v), and handle the re-ordering of the columns (putting the user_id column first) on the client side, when the rows are retrieved.

The other two common approaches are using a semi-join, and using a correlated subquery, although these approaches can be more resource intensive when processing large sets.

Upvotes: 2

Related Questions