Reputation: 3272
I'm trying to calculate the cumulative revenue over time since 2020-01-01. I have user level revenue data with the following schema
create table revenue
(
game_id varchar(255),
user_id varchar(255),
amount int,
activity_date varchar(255)
);
insert into revenue
(game_id, user_id, amount, activity_date)
values
('Racing', 'ABC123', 5, '2020-01-01'),
('Racing', 'ABC123', 1, '2020-01-04'),
('Racing', 'CDE123', 1, '2020-01-04'),
('DH', 'CDE123', 100, '2020-01-03'),
('DH', 'CDE456', 10, '2020-01-02'),
('DH', 'CDE789', 5, '2020-01-02'),
('DH', 'CDE456', 1, '2020-01-03'),
('DH', 'CDE456', 1, '2020-01-03');
Expected Output
Game Age Cum_rev Total_unique_payers_per_game
Racing 0 5 2
Racing 1 5 2
Racing 2 5 2
Racing 3 7 2
DH 0 0 3
DH 1 15 3
DH 2 117 3
DH 3 117 3
Age is calculated as the difference between transaction date and 2020-01-01. I'm using the below logic
SELECT game_id, DATEDIFF(activity_date ,'2020-01-01') as Age,count(user_id) as Total_unique_payers
from REVENUE
SQL fiddle How do I calculate the cumulative revenue?
Upvotes: 2
Views: 859
Reputation: 23361
Using MySQL 5.7 the only way of doing it is using its variables system, its trick though it works. It simulates the the Windowing Functions used by @Used_By_Already on his answer
Since you mentioned that you care about the gaps, you need first to create a dates table, that's easily done as:
create table dates_view (
date_day date
);
insert into dates_view
select date_add( '2019-12-31', INTERVAL @rownum:=@rownum+1 day ) as date_day
from (
select 0 union select 1 union select 2 union select 3
union select 4 union select 5 union select 6
union select 7 union select 8 union select 9
) a, (
select 0 union select 1 union select 2 union select 3
union select 4 union select 5 union select 6
union select 7 union select 8 union select 9
) b, (select @rownum:=0) r;
-- Note: each set of select union above will multiply the number
-- of days by 10, so if you need more days in your table just add more
-- set as above "a" or "b" sets
After having your Dates table you have to cross join it with your current revenue
table the catch is you want the number of players to be independent of the cumulative amount
so you need to compute it independently in a subquery.
You also need to compute the max(activity_date)
of your revenue
table in order to limit the results until to it.
So below query will do just it (based on your current sample data):
set @_sum:=0; -- Note: this two lines depends on the client
set @_currGame:=''; -- you are using. Some accumulate variable per session
-- some doesn't, below site, for instance does
select a.game_id,
a.age,
case when @_currGame = game_id
then @_sum:=coalesce(samount,0) + @_sum
else @_sum:=coalesce(samount,0) end as Cum_rev,
a.Total_unique_payers_per_game,
@_currGame := game_id varComputeCurrGame
from
(
select players.game_id,
rev.samount,
datediff(dv.date_day, '2020-01-01') age,
players.noPlayers Total_unique_payers_per_game
from (select @_sum:=0) am,
dates_view dv
cross join (select max(activity_date) maxDate from revenue) md
on dv.date_day <= md.maxDate
cross join (select game_id, count(distinct user_id) noPlayers
from revenue group by game_id) players
left join (select game_id, activity_date, sum(amount) samount
from revenue group by game_id, activity_date) rev
on players.game_id = rev.game_id
and dv.date_day = rev.activity_date
) a,
(select @_sum:=0) s,
(select @_currGame='') x
order by a.game_id desc, a.age;
This will result in:
game_id age Cum_rev Total_unique_payers_per_game varComputeCurrGame
Racing 0 5 2 Racing
Racing 1 5 2 Racing
Racing 2 5 2 Racing
Racing 3 7 2 Racing
DH 0 0 3 DH
DH 1 15 3 DH
DH 2 117 3 DH
DH 3 117 3 DH
See it working here (you need to run it): https://www.db-fiddle.com/f/qifZ6hmpvcSZYwhLDv613d/2
Here is a version for MySQL 8.x that supports windowing functions:
select distinct agetable.game_id,
agetable.age,
sum(coalesce(r1.amount,0))
over (partition by agetable.game_id
order by agetable.game_id, agetable.age) as sm,
agetable.ttplayers
from
(
select r.game_id, dv.date_day, datediff(dv.date_day, '2020-01-01') age, p.ttplayers
from dates_view dv
cross join (select distinct game_id, activity_date from revenue) r
on dv.date_day <= (select max(activity_date) from revenue)
left join (select game_id, count(distinct user_id) ttplayers from revenue group by game_id) p
on r.game_id = p.game_id
group by r.game_id desc, dv.date_day, age, p.ttplayers
) agetable
left join revenue r1
on agetable.date_day = r1.activity_date
and r1.game_id = agetable.game_id
order by agetable.game_id desc, agetable.age
Upvotes: 1
Reputation: 35573
For the following you need a version of MySQL that supports the over()
clause (MySQL 8+) - I used MariaDB 10.4 below (MySQL 8 wasn't working at the site when I tried)
create table revenue ( game_id varchar(255), user_id varchar(255), amount int, activity_date varchar(255) ); insert into revenue (game_id, user_id, amount, activity_date) values ('Racing', 'ABC123', 5, '2020-01-01'), ('Racing', 'ABC123', 1, '2020-01-04'), ('Racing', 'CDE123', 1, '2020-01-04'), ('DH', 'CDE123', 100, '2020-01-03'), ('DH', 'CDE456', 10, '2020-01-02'), ('DH', 'CDE789', 5, '2020-01-02'), ('DH', 'CDE456', 1, '2020-01-03'), ('DH', 'CDE456', 1, '2020-01-03');
✓ ✓
SELECT game_id , user_id , activity_date , amount , sum(amount) over(order by activity_date, user_id) as running_sum , (select count(distinct user_id) from revenue) as Total_unique_payers from revenue order by activity_date , user_id
game_id | user_id | activity_date | amount | running_sum | Total_unique_payers :------ | :------ | :------------ | -----: | ----------: | ------------------: Racing | ABC123 | 2020-01-01 | 5 | 5 | 4 DH | CDE456 | 2020-01-02 | 10 | 15 | 4 DH | CDE789 | 2020-01-02 | 5 | 20 | 4 DH | CDE123 | 2020-01-03 | 100 | 120 | 4 DH | CDE456 | 2020-01-03 | 1 | 122 | 4 DH | CDE456 | 2020-01-03 | 1 | 122 | 4 Racing | ABC123 | 2020-01-04 | 1 | 123 | 4 Racing | CDE123 | 2020-01-04 | 1 | 124 | 4
db<>fiddle here
Changing the order of calculation inside the over clause affects how the running sum is calculated: e.g.
SELECT game_id , user_id , activity_date , amount , sum(amount) over(order by game_id DESC, activity_date, user_id) as running_sum , (select count(distinct user_id) from revenue) as Total_unique_payers from revenue order by game_id DESC , activity_date , user_id
game_id | user_id | activity_date | amount | running_sum | Total_unique_payers :------ | :------ | :------------ | -----: | ----------: | ------------------: Racing | ABC123 | 2020-01-01 | 5 | 5 | 4 Racing | ABC123 | 2020-01-04 | 1 | 6 | 4 Racing | CDE123 | 2020-01-04 | 1 | 7 | 4 DH | CDE456 | 2020-01-02 | 10 | 17 | 4 DH | CDE789 | 2020-01-02 | 5 | 22 | 4 DH | CDE123 | 2020-01-03 | 100 | 122 | 4 DH | CDE456 | 2020-01-03 | 1 | 124 | 4 DH | CDE456 | 2020-01-03 | 1 | 124 | 4
db<>fiddle here
Upvotes: 1