Reputation: 2110
Imagine a SQL table votes
, with two columns vote
and time
; vote
is a string, time
is a timestamp. I'd like to retrieve a "cumulative count" from the table. So, the table looks like this:
vote | time
-----------+----------------
stuart | 1443000000
stuart | 1443000000
bryan | 1443000000
stuart | 1443000001
bryan | 1443000001
bryan | 1443000002
bryan | 1443000002
so, "stuart" gets two votes in the first time interval and "bryan" gets one; they get one each in the second interval, and bryan gets two more in the third, and thus the output results I would like are these:
vote | time | cumulative total
-----------+--------------+-----------------
stuart | 1443000000 | 2
bryan | 1443000000 | 1
stuart | 1443000001 | 3
bryan | 1443000001 | 2
stuart | 1443000002 | 3
bryan | 1443000002 | 4
I'm not sure how to do this. Obviously, getting how many votes happened in each time interval is trivial (SELECT vote, time, COUNT(*) from table GROUP BY time, vote ORDER BY time, vote
) but making a cumulative count seems harder. Is there a way to do so?
(I'm using MySQL, and am happy to use MySQL-specific functions here if useful.)
Upvotes: 5
Views: 4954
Reputation: 220797
MySQL 8 introduced support for window functions, which are the perfect tool for cumulative sums. They will perform much better than the alternatives using correlated subqueries. Your desired query can be written as follows:
SELECT
vote,
time,
sum(count(*)) OVER (PARTITION BY vote ORDER BY time)
FROM table
GROUP BY time, vote
ORDER BY time, vote
Note that the inner count(*)
is an ordinary aggregate function, which is nested in the outer sum(...) OVER (...)
window function. This is possible because aggregate functions are logically calculated before window functions.
Upvotes: 3
Reputation: 9618
Ok, if you really need that stuart | 1443000002 | 3
row then you can do this:
select distinct vote, time,
(select count(*) from `table` t2
where t2.vote = t1.vote
and t2.time <= t1.time) `cumulative total`
from (select distinct t3.vote vote, t4.time time
from `table` t3, `table` t4) t1;
Here is the SQL Fiddle: http://sqlfiddle.com/#!9/960d6/11/0
What we need to do is make a table that contains EVERY combination of names and times (a join with no conditions) and then calculate the counts from the original table using all those combinations.
Thanks to @jarlh for the query in his answer - not having to start from scratch saved a bunch of time.
Upvotes: 0
Reputation: 44766
Using a correlated sub-query to do the counting:
select distinct vote, time, (select count(*) from table t2
where t2.vote = t1.vote
and t2.time <= t1.time)
from table t1
Upvotes: 6