Reputation: 833
In Amazon Athena I have a table that looks like this:
id amount date
1 100 2018-04-05
1 50 2018-06-18
2 10 2018-04-23
2 100 2018-04-28
2 50 2018-07-07
2 10 2018-08-08
And I would like a result such as
id cum_sum date
1 100 2018-04
1 100 2018-05
1 150 2018-06
1 150 2018-07
1 150 2018-08
2 110 2018-04
2 110 2018-05
2 110 2018-06
2 160 2018-07
2 170 2018-08
So I would like to have the cumulative sum per ID per end of month (last day of month). I know how to do it month by month, but not in one query.
Another problem also becomes filling in the empty months (i.e. ID 1 does not have entries for all months, so the cumulative sum just has to reused).
If there's a solution for MySQL as well, I'd appreciate it too.
I hope this makes sense, and thanks in advance.
Upvotes: 5
Views: 12285
Reputation: 1269823
You can use window functions in PrestoDB. You can generate the dates. It is also simple enough to just list these out:
with months as (
selecct '2018-04-01' as yyyy_mm union all -- use the first of the month
select '2018-05-01' union all
select '2018-06-01' union all
select '2018-07-01' union all
select '2018-08-01'
)
select i.id, m.yyyy_mm, sum(t.amt) as month_amount,
sum(sum(t.amt)) over (partition by i.id order by m.yyyy_mm) as cumulative_amt
from (select distinct id from t) i cross join
months m left join
t
on t.id = i.id and
t.date >= m.yyyy_mm and
t.date < m.yyyy_mm + interval '1 day'
group by i.id, m.yyyy_mm
order by i.id, m.yyyy_mm;
This should also work in MySQL 8.0. In earlier versions, you would need variables or a correlated subquery. The first would not work in PrestoDB. The second might have much worse performance.
Upvotes: 3
Reputation: 521279
Here is a MySQL 8+ solution, which however can easily be adapted to earlier versions, or to another database which supports CTE. It uses calendar tables for the id
values and dates. After generating the amounts across months/ids, it then does a cumulative sum to obtain the final result.
WITH ids AS (
SELECT 1 AS id FROM dual UNION ALL
SELECT 2 FROM dual
),
months AS (
SELECT '2018-04-01' AS month UNION ALL -- use the first of the month
SELECT '2018-05-01' UNION ALL -- to represent a given month
SELECT '2018-06-01' UNION ALL
SELECT '2018-07-01' UNION ALL
SELECT '2018-08-01'
),
cte AS (
SELECT
i.id,
m.month,
SUM(amount) AS amount
FROM ids i
CROSS JOIN months m
LEFT JOIN yourTable t
ON t.id = i.id AND
t.date >= m.month AND t.date < DATE_ADD(m.month, INTERVAL 1 MONTH)
GROUP BY
i.id,
m.month
)
SELECT
id,
(SELECT SUM(t2.amount) FROM cte t2
WHERE t1.id = t2.id AND t2.month <= t1.month) cum_sum,
DATE_FORMAT(month, '%Y-%m') AS date
FROM cte t1
ORDER BY
id,
month;
The major challenge in getting the above to work on an earlier version of MySQL or on PrestoDB would hinge on possibly removing the CTE, and also the date function logic. Other than that, the query should work unchanged.
Upvotes: 0