Reputation: 157
I have table [visit]. I need to get the rows grouped by user_id with sum of visit_duration_seconds for when order_number is null, for example, for user [2875636] i'll get: 61+151+33+13. Each row should include the sum of the rows before it.
Plz also refer RESULT column in expected result below
user_id starttime visit_duration_seconds order_number 2875636 2013-01-16 18:03:50 61 2875636 2013-01-16 18:08:18 151 2875636 2013-01-16 18:15:43 33 2875636 2013-01-16 18:16:37 13 2875636 2013-01-16 18:18:01 2011 10177888 2875636 2013-01-16 18:24:35 1172 10177884 2875636 2013-01-16 18:32:03 4731 2875636 2013-01-16 18:33:27 407 2875636 2013-01-16 18:37:29 74 2875636 2013-01-16 18:48:55 80 2875636 2013-01-16 19:05:00 1955 2875636 2013-01-16 19:14:12 326 2875636 2013-01-16 19:23:39 972 2875636 2013-01-16 19:33:05 5440 2875636 2013-01-16 19:35:48 43 2875636 2013-01-16 19:41:10 66 2875636 2013-01-16 19:42:03 100 2875636 2013-01-16 19:42:12 2414 10177940 2875636 2013-01-16 19:49:05 432 10177925 2875636 2013-01-16 19:50:19 183 2875636 2013-01-16 19:52:46 2061 2875636 2013-01-16 19:52:53 400 2875636 2013-01-16 20:00:47 338 2875636 2013-01-16 20:08:58 216 2875636 2013-01-16 20:14:21 58 2875636 2013-01-16 20:14:26 196 2875636 2013-01-16 20:19:14 2189 2875636 2013-01-16 20:21:29 424 2875636 2013-01-16 20:24:42 999 2875636 2013-01-16 21:01:39 1810 2875636 2013-01-16 21:02:54 525 2875636 2013-01-16 21:10:06 27 2875636 2013-01-16 21:12:08 282 2875636 2013-01-16 21:51:02 6 2875636 2013-01-16 22:18:34 173 2875636 2013-01-16 23:02:58 318 2875636 2013-01-16 23:45:37 207 3018868 2013-01-16 16:01:45 18 3018868 2013-01-16 16:16:45 39 3018868 2013-01-16 16:22:55 656 3018868 2013-01-16 16:25:54 1852 3018868 2013-01-16 16:29:23 688 3018868 2013-01-16 16:47:26 2258 10177846 3018868 2013-01-16 16:57:41 572 3018868 2013-01-16 17:06:47 1431 3018868 2013-01-16 17:18:32 29 3018868 2013-01-16 17:21:57 45 3018868 2013-01-16 17:29:23 16 3018868 2013-01-16 17:36:47 490
EXPECTED RESULT
user_id starttime visit_duration_seconds order_number RESULT
2875636 2013-01-16 18:03:50 61 61
2875636 2013-01-16 18:08:18 151 212
2875636 2013-01-16 18:15:43 33 245
2875636 2013-01-16 18:16:37 13 258
2875636 2013-01-16 18:18:01 2011 10177888 0
2875636 2013-01-16 18:24:35 1172 10177884 0
2875636 2013-01-16 18:32:03 4731 4731
2875636 2013-01-16 18:33:27 407 5138
2875636 2013-01-16 18:37:29 74 5212
2875636 2013-01-16 18:48:55 80 ...
2875636 2013-01-16 19:05:00 1955 ...
2875636 2013-01-16 19:14:12 326 ...
2875636 2013-01-16 19:23:39 972
2875636 2013-01-16 19:33:05 5440
2875636 2013-01-16 19:35:48 43
2875636 2013-01-16 19:41:10 66
2875636 2013-01-16 19:42:03 100
2875636 2013-01-16 19:42:12 2414 10177940
2875636 2013-01-16 19:49:05 432 10177925
2875636 2013-01-16 19:50:19 183
2875636 2013-01-16 19:52:46 2061
2875636 2013-01-16 19:52:53 400
2875636 2013-01-16 20:00:47 338
2875636 2013-01-16 20:08:58 216
2875636 2013-01-16 20:14:21 58
2875636 2013-01-16 20:14:26 196
2875636 2013-01-16 20:19:14 2189
2875636 2013-01-16 20:21:29 424
2875636 2013-01-16 20:24:42 999
2875636 2013-01-16 21:01:39 1810
2875636 2013-01-16 21:02:54 525
2875636 2013-01-16 21:10:06 27
2875636 2013-01-16 21:12:08 282
2875636 2013-01-16 21:51:02 6
2875636 2013-01-16 22:18:34 173
2875636 2013-01-16 23:02:58 318
2875636 2013-01-16 23:45:37 207
3018868 2013-01-16 16:01:45 18
3018868 2013-01-16 16:16:45 39
3018868 2013-01-16 16:22:55 656
3018868 2013-01-16 16:25:54 1852
3018868 2013-01-16 16:29:23 688
3018868 2013-01-16 16:47:26 2258 10177846
3018868 2013-01-16 16:57:41 572
3018868 2013-01-16 17:06:47 1431
3018868 2013-01-16 17:18:32 29
3018868 2013-01-16 17:21:57 45
3018868 2013-01-16 17:29:23 16
3018868 2013-01-16 17:36:47 490
Upvotes: 3
Views: 1379
Reputation: 108370
You can make use of MySQL user variables to emulate analytic functions. (There are some other approaches as well, like using a semi-join or using a correlated subquery. I can provide solutions for those as well, if you feel they may be more appropriate.)
For emulating a "running total" analytic function, try something like this:
SELECT t.user_id
, t.starttime
, t.order_number
, IF(t.order_number IS NOT NULL,
@tot_dur := 0,
@tot_dur := @tot_dur + t.visit_duration_seconds) AS tot_dur
FROM visit t
JOIN (SELECT @tot_dur := 0) d
ORDER BY t.user_id, t.start_time
The "trick" here is to use an IF function to test whether or not order_number
is null. When it's null, we add the duration value to the variable, otherwise, we set the variable to zero.
We use an inline view (aliased as d
, to ensure that the @tot_dur variable is initialized to zero.
NOTE: Take care with using MySQL user variables like this. In the SELECT statement as above, the assignment of the variables in the SELECT list happens after the ORDER BY, so we can get deterministic behavior.
That query does not handle "breaks" in user_id. To get that, we are going to need the value of user_id from the previous row. We can preserve that in another user variable. The order of the operations is deterministic, and we need to take care to do the accumulation BEFORE we overwrite the user_id from the previous row.
We either need to reorder the columns so that user_id appears after tot_dur (or include a second copy of the user_id column)
SELECT t.user_id
, t.starttime
, t.order_number
, IF(t.order_number IS NULL,
@tot_dur := IF(@prev_user_id = t.user_id,@tot_dur,0) + t.visit_duration_seconds,
@tot_dur := 0
) AS tot_dur
, @prev_user_id := t.user_id AS prev_user_id
FROM visit t
JOIN (SELECT @tot_dur := 0, @prev_user_id := NULL) d
ORDER BY t.user_id, t.start_time
The values returned in the user_id
and prev_user_id
columns is identical. That "extra" column could be removed, or the columns could be reordered by wrapping the query (as an inline view) in another query, although this comes at a performance cost:
SELECT v.user_id
, v.starttime
, v.order_number
, v.tot_dur
FROM (SELECT t.starttime
, t.order_number
, IF(t.order_number IS NULL,
@tot_dur := IF(@prev_user_id = t.user_id,@tot_dur,0) + t.visit_duration_seconds,
@tot_dur := 0
) AS tot_dur
, @prev_user_id := t.user_id AS user_id
FROM visit t
JOIN (SELECT @tot_dur := 0, @prev_user_id := NULL) d
ORDER BY t.user_id, t.start_time
) v
That query demonstrates that it is possible for MySQL to return the specified resultset. But for optimum performance, we'd want to run just the query in the inline view (aliased as v
), and handle the re-ordering of the columns (putting the user_id column first) on the client side, when the rows are retrieved.
The other two common approaches are using a semi-join, and using a correlated subquery, although these approaches can be more resource intensive when processing large sets.
Upvotes: 2