Reputation: 67
I have a table look like below....
ID HID Date UID
1 1 2012-01-01 1002
2 1 2012-01-24 2005
3 1 2012-02-15 5152
4 2 2012-01-01 6252
5 2 2012-01-19 10356
6 3 2013-01-06 10989
7 3 2013-03-25 25001
8 3 2014-01-14 35798
How can i group by HID, Year, Month and count(UID) and add a cumulative_sum (which is count of UID). So the final result look like this...
HID Year Month Count cumulative_sum
1 2012 01 2 2
1 2012 02 1 3
2 2012 01 2 2
3 2013 01 1 1
3 2013 03 1 2
3 2014 01 1 3
What's the best way to accomplish this using query?
Upvotes: 4
Views: 3990
Reputation: 33935
I made assumptions about the original data set. You should be able to adapt this to the revised dataset - although note that the solution using variables (instead of my self-join) is faster...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(ID INT NOT NULL
,Date DATE NOT NULL
,UID INT NOT NULL PRIMARY KEY
);
INSERT INTO my_table VALUES
(1 ,'2012-01-01', 1002),
(1 ,'2012-01-24', 2005),
(1 ,'2012-02-15', 5152),
(2 ,'2012-01-01', 6252),
(2 ,'2012-01-19', 10356),
(3 ,'2013-01-06', 10989),
(3 ,'2013-03-25', 25001),
(3 ,'2014-01-14', 35798);
SELECT a.*
, SUM(b.count) cumulative
FROM
(
SELECT x.id,YEAR(date) year,MONTH(date) month, COUNT(0) count FROM my_table x GROUP BY id,year,month
) a
JOIN
(
SELECT x.id,YEAR(date) year,MONTH(date) month, COUNT(0) count FROM my_table x GROUP BY id,year,month
) b
ON b.id = a.id AND (b.year < a.year OR (b.year = a.year AND b.month <= a.month)
)
GROUP
BY a.id, a.year,a.month;
+----+------+-------+-------+------------+
| id | year | month | count | cumulative |
+----+------+-------+-------+------------+
| 1 | 2012 | 1 | 2 | 2 |
| 1 | 2012 | 2 | 1 | 3 |
| 2 | 2012 | 1 | 2 | 2 |
| 3 | 2013 | 1 | 1 | 1 |
| 3 | 2013 | 3 | 1 | 2 |
| 3 | 2014 | 1 | 1 | 3 |
+----+------+-------+-------+------------+
If you don't mind an extra column in the result, you can simplify (and accelerate) the above, as follows:
SELECT x.*
, @running:= IF(@previous=x.id,@running,0)+x.count cumulative
, @previous:=x.id
FROM
( SELECT x.id,YEAR(date) year,MONTH(date) month, COUNT(0) count FROM my_table x GROUP BY id,year,month ) x
,( SELECT @cumulative := 0,@running:=0) vals;
Upvotes: 5
Reputation: 2174
The code turns out kind of messy, and it reads as follows:
SELECT
HID,
strftime('%Y', `Date`) AS Year,
strftime('%m', `Date`) AS Month,
COUNT(UID) AS Count,
(SELECT
COUNT(UID)
FROM your_db A
WHERE
A.HID=B.HID
AND
(strftime('%Y', A.`Date`) < strftime('%Y', B.`Date`)
OR
(strftime('%Y', A.`Date`) = strftime('%Y', B.`Date`)
AND
strftime('%m', A.`Date`) <= strftime('%m', B.`Date`)))) AS cumulative_count
FROM your_db B
GROUP BY HID, YEAR, MONTH
Though by using views, it should become much clearer:
CREATE VIEW temp_data AS SELECT
HID,
strftime('%Y', `Date`) as Year,
strftime('%m', `Date`) as Month,
COUNT(UID) as Count
FROM your_db GROUP BY HID, YEAR, MONTH;
Then your statement will read as follows:
SELECT
HID,
Year,
Month,
`Count`,
(SELECT SUM(`Count`)
FROM temp_data A
WHERE
A.HID = B.HID
AND
(A.Year < B.Year
OR
(A.Year = B.Year
AND
A.Month <= B.Month))) AS cumulative_sum
FROM temp_data B;
Upvotes: 1