Artem
Artem

Reputation: 833

Cumulative sum by id and by month in Presto

In Amazon Athena I have a table that looks like this:

id   amount date
1    100    2018-04-05
1    50     2018-06-18
2    10     2018-04-23
2    100    2018-04-28
2    50     2018-07-07
2    10     2018-08-08

And I would like a result such as

id   cum_sum date
1    100    2018-04
1    100    2018-05
1    150    2018-06
1    150    2018-07
1    150    2018-08
2    110    2018-04
2    110    2018-05
2    110    2018-06 
2    160    2018-07
2    170    2018-08

So I would like to have the cumulative sum per ID per end of month (last day of month). I know how to do it month by month, but not in one query.

Another problem also becomes filling in the empty months (i.e. ID 1 does not have entries for all months, so the cumulative sum just has to reused).

If there's a solution for MySQL as well, I'd appreciate it too.

I hope this makes sense, and thanks in advance.

Upvotes: 5

Views: 12285

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269823

You can use window functions in PrestoDB. You can generate the dates. It is also simple enough to just list these out:

with months as (
      selecct '2018-04-01' as yyyy_mm union all    -- use the first of the month
      select '2018-05-01' union all
      select '2018-06-01' union all
      select '2018-07-01' union all
      select '2018-08-01'
     )
select i.id, m.yyyy_mm, sum(t.amt) as month_amount,
       sum(sum(t.amt)) over (partition by i.id order by m.yyyy_mm) as cumulative_amt
from (select distinct id from t) i cross join
     months m left join
     t
     on t.id = i.id and
        t.date >= m.yyyy_mm and
        t.date < m.yyyy_mm + interval '1 day'
group by i.id, m.yyyy_mm
order by i.id, m.yyyy_mm;

This should also work in MySQL 8.0. In earlier versions, you would need variables or a correlated subquery. The first would not work in PrestoDB. The second might have much worse performance.

Upvotes: 3

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521279

Here is a MySQL 8+ solution, which however can easily be adapted to earlier versions, or to another database which supports CTE. It uses calendar tables for the id values and dates. After generating the amounts across months/ids, it then does a cumulative sum to obtain the final result.

WITH ids AS (
    SELECT 1 AS id FROM dual UNION ALL
    SELECT 2 FROM dual
),
months AS (
    SELECT '2018-04-01' AS month UNION ALL    -- use the first of the month
    SELECT '2018-05-01' UNION ALL             -- to represent a given month
    SELECT '2018-06-01' UNION ALL
    SELECT '2018-07-01' UNION ALL
    SELECT '2018-08-01'
),
cte AS (
    SELECT
        i.id,
        m.month,
        SUM(amount) AS amount
    FROM ids i
    CROSS JOIN months m
    LEFT JOIN yourTable t
        ON t.id = i.id AND
           t.date >= m.month AND t.date < DATE_ADD(m.month, INTERVAL 1 MONTH)
    GROUP BY
        i.id,
        m.month
)

SELECT
    id,
    (SELECT SUM(t2.amount) FROM cte t2
     WHERE t1.id = t2.id AND t2.month <= t1.month) cum_sum,
    DATE_FORMAT(month, '%Y-%m') AS date
FROM cte t1
ORDER BY
    id,
    month;

enter image description here

Demo

The major challenge in getting the above to work on an earlier version of MySQL or on PrestoDB would hinge on possibly removing the CTE, and also the date function logic. Other than that, the query should work unchanged.

Upvotes: 0

Related Questions