Group by sum with running calculation

Question

I have df as

Sales | Users |   DT
40000    234     1/4/2018
500      100     1/23/2018
4000     555     2/4/2018
234      1000    6/4/2018
1000     2000    6/14/2018
50000    1200    9/12/2018
40000    234     1/14/2019
500      100     2/23/2019
2145     555     2/26/2019
344      1000    6/14/2019
1234     1233    6/21/2019
432      5000     11/12/2019

Is there method to group-by to both sum up sales by yr and also create a rolling total of users as my users column is composed of both new users, and users from previous date so basically group by year, sum sales, and then sum users from next date - prev date users

Thank you

Laurent · Accepted Answer

With the dataframe you provided:

import pandas as pd


df = pd.DataFrame(
    {
        "Sales": [40000, 500, 4000, 234, 1000, 50000, 40000, 500, 2145, 344, 1234, 432],
        "Users": [234, 100, 555, 1000, 2000, 1200, 234, 100, 555, 1000, 1233, 5000],
        "DT": ["1/4/2018", "1/23/2018", "2/4/2018", "6/4/2018", "6/14/2018", "9/12/2018", "1/14/2019", "2/23/2019", "2/26/2019", "6/14/2019", "6/21/2019", "11/12/2019"],
    }
)

Here is one way to do it using to_datetime and cumsum:

df["DT"] = pd.to_datetime(df["DT"]).dt.year

df = (
    df.groupby("DT")
    .agg(sum)
    .pipe(lambda df_: df_.assign(Users=df_["Users"].cumsum()))
    .reset_index()
)

print(df)
# Output
     DT  Sales  Users
0  2018  95734   5089
1  2019  44655  13211

Group by sum with running calculation

Answers (1)

Related Questions