shockwave
shockwave

Reputation: 3272

SQL Calculate cumulative value over a period

I'm trying to calculate the cumulative revenue over time since 2020-01-01. I have user level revenue data with the following schema

create table revenue
(
  game_id        varchar(255),
  user_id        varchar(255),
  amount         int,
  activity_date  varchar(255)
);

insert into revenue
  (game_id, user_id, amount, activity_date)
values
  ('Racing', 'ABC123', 5, '2020-01-01'),
  ('Racing', 'ABC123', 1, '2020-01-04'),
  ('Racing', 'CDE123', 1, '2020-01-04'),
  ('DH', 'CDE123', 100, '2020-01-03'),
  ('DH', 'CDE456', 10, '2020-01-02'),
  ('DH', 'CDE789', 5, '2020-01-02'),
  ('DH', 'CDE456', 1, '2020-01-03'),
  ('DH', 'CDE456', 1, '2020-01-03');

Expected Output

Game    Age    Cum_rev    Total_unique_payers_per_game
Racing  0      5          2
Racing  1      5          2
Racing  2      5          2
Racing  3      7          2
DH      0      0          3
DH      1      15         3
DH      2      117        3
DH      3      117        3

Age is calculated as the difference between transaction date and 2020-01-01. I'm using the below logic

SELECT game_id, DATEDIFF(activity_date ,'2020-01-01') as Age,count(user_id) as Total_unique_payers
from REVENUE

SQL fiddle How do I calculate the cumulative revenue?

Upvotes: 2

Views: 859

Answers (2)

Jorge Campos
Jorge Campos

Reputation: 23361

Using MySQL 5.7 the only way of doing it is using its variables system, its trick though it works. It simulates the the Windowing Functions used by @Used_By_Already on his answer

Since you mentioned that you care about the gaps, you need first to create a dates table, that's easily done as:

create table dates_view (
  date_day date
);

insert into dates_view
select date_add( '2019-12-31', INTERVAL @rownum:=@rownum+1 day ) as date_day
from (
   select 0 union select 1 union select 2 union select 3 
   union select 4 union select 5 union select 6 
   union select 7 union select 8 union select 9
) a, (
   select 0 union select 1 union select 2 union select 3 
   union select 4 union select 5 union select 6 
   union select 7 union select 8 union select 9
) b, (select @rownum:=0) r;

-- Note: each set of select union above will multiply the number 
-- of days by 10, so if you need more days in your table just add more
-- set as above "a" or "b" sets

After having your Dates table you have to cross join it with your current revenue table the catch is you want the number of players to be independent of the cumulative amount so you need to compute it independently in a subquery.

You also need to compute the max(activity_date) of your revenue table in order to limit the results until to it.

So below query will do just it (based on your current sample data):

set @_sum:=0;       -- Note: this two lines depends on the client
set @_currGame:=''; -- you are using. Some accumulate variable per session
                    -- some doesn't, below site, for instance does

select a.game_id,
       a.age,
       case when @_currGame = game_id 
            then @_sum:=coalesce(samount,0) + @_sum
            else @_sum:=coalesce(samount,0) end as Cum_rev,
       a.Total_unique_payers_per_game,
       @_currGame := game_id varComputeCurrGame
from 
    (
    select players.game_id, 
           rev.samount,
           datediff(dv.date_day, '2020-01-01') age,
           players.noPlayers Total_unique_payers_per_game
       from (select @_sum:=0) am,
            dates_view dv
             cross join (select max(activity_date) maxDate from revenue) md 
               on dv.date_day <= md.maxDate
             cross join (select game_id, count(distinct user_id) noPlayers 
                           from revenue group by game_id) players
             left join (select game_id, activity_date, sum(amount) samount 
                          from revenue group by game_id, activity_date) rev
                on players.game_id = rev.game_id
                   and dv.date_day = rev.activity_date
    ) a,
    (select @_sum:=0) s,
    (select @_currGame='') x
order by a.game_id desc, a.age;

This will result in:

  game_id   age  Cum_rev  Total_unique_payers_per_game   varComputeCurrGame
   Racing    0      5             2                            Racing
   Racing    1      5             2                            Racing
   Racing    2      5             2                            Racing
   Racing    3      7             2                            Racing
   DH        0      0             3                            DH    
   DH        1      15            3                            DH    
   DH        2      117           3                            DH    
   DH        3      117           3                            DH  

See it working here (you need to run it): https://www.db-fiddle.com/f/qifZ6hmpvcSZYwhLDv613d/2

Here is a version for MySQL 8.x that supports windowing functions:

select distinct agetable.game_id,
       agetable.age,
       sum(coalesce(r1.amount,0)) 
             over (partition by agetable.game_id 
                     order by agetable.game_id, agetable.age) as sm,
       agetable.ttplayers
from
    (
    select r.game_id, dv.date_day, datediff(dv.date_day, '2020-01-01') age, p.ttplayers
    from dates_view dv
          cross join (select distinct game_id, activity_date from revenue) r 
            on dv.date_day <= (select max(activity_date) from revenue)
          left join (select game_id, count(distinct user_id) ttplayers from revenue group by game_id) p
            on r.game_id = p.game_id
    group by r.game_id desc, dv.date_day, age, p.ttplayers
    ) agetable
    left join revenue r1
      on agetable.date_day = r1.activity_date
         and r1.game_id = agetable.game_id
order by agetable.game_id desc, agetable.age

Upvotes: 1

Paul Maxwell
Paul Maxwell

Reputation: 35573

For the following you need a version of MySQL that supports the over() clause (MySQL 8+) - I used MariaDB 10.4 below (MySQL 8 wasn't working at the site when I tried)

create table revenue
(
  game_id        varchar(255),
  user_id        varchar(255),
  amount         int,
  activity_date  varchar(255)
);

insert into revenue
  (game_id, user_id, amount, activity_date)
values
  ('Racing', 'ABC123', 5, '2020-01-01'),
  ('Racing', 'ABC123', 1, '2020-01-04'),
  ('Racing', 'CDE123', 1, '2020-01-04'),
  ('DH', 'CDE123', 100, '2020-01-03'),
  ('DH', 'CDE456', 10, '2020-01-02'),
  ('DH', 'CDE789', 5, '2020-01-02'),
  ('DH', 'CDE456', 1, '2020-01-03'),
  ('DH', 'CDE456', 1, '2020-01-03');
  
  
✓

✓
SELECT
  game_id
, user_id
, activity_date
, amount
, sum(amount) over(order by activity_date, user_id) as running_sum
, (select count(distinct user_id) from revenue) as Total_unique_payers
from revenue
order by
  activity_date
, user_id
game_id | user_id | activity_date | amount | running_sum | Total_unique_payers
:------ | :------ | :------------ | -----: | ----------: | ------------------:
Racing  | ABC123  | 2020-01-01    |      5 |           5 |                   4
DH      | CDE456  | 2020-01-02    |     10 |          15 |                   4
DH      | CDE789  | 2020-01-02    |      5 |          20 |                   4
DH      | CDE123  | 2020-01-03    |    100 |         120 |                   4
DH      | CDE456  | 2020-01-03    |      1 |         122 |                   4
DH      | CDE456  | 2020-01-03    |      1 |         122 |                   4
Racing  | ABC123  | 2020-01-04    |      1 |         123 |                   4
Racing  | CDE123  | 2020-01-04    |      1 |         124 |                   4

db<>fiddle here

Changing the order of calculation inside the over clause affects how the running sum is calculated: e.g.

SELECT
  game_id
, user_id
, activity_date
, amount
, sum(amount) over(order by game_id DESC, activity_date, user_id) as running_sum
, (select count(distinct user_id) from revenue) as Total_unique_payers
from revenue
order by
  game_id DESC
, activity_date
, user_id
game_id | user_id | activity_date | amount | running_sum | Total_unique_payers
:------ | :------ | :------------ | -----: | ----------: | ------------------:
Racing  | ABC123  | 2020-01-01    |      5 |           5 |                   4
Racing  | ABC123  | 2020-01-04    |      1 |           6 |                   4
Racing  | CDE123  | 2020-01-04    |      1 |           7 |                   4
DH      | CDE456  | 2020-01-02    |     10 |          17 |                   4
DH      | CDE789  | 2020-01-02    |      5 |          22 |                   4
DH      | CDE123  | 2020-01-03    |    100 |         122 |                   4
DH      | CDE456  | 2020-01-03    |      1 |         124 |                   4
DH      | CDE456  | 2020-01-03    |      1 |         124 |                   4

db<>fiddle here

Upvotes: 1

Related Questions