codedancer
codedancer

Reputation: 1634

Rolling difference in group and divivded by group sum in Pandas

I am wondering if there's an easier/faster way (ideally in pipe method so it looks nicer!) I can work out the rolling difference divided by previous group sum. In the result outout, pc column is the column I am after.

import pandas as pd

df = pd.DataFrame(
    {
        "Date": ["2020-01-01", "2020-01-01", "2020-01-01", "2021-01-01", "2021-01-01", "2021-01-01", "2022-01-01", "2022-01-01", "2022-01-01"],
        "City": ["London", "New York", "Tokyo", "London", "New York", "Tokyo", "London", "New York", "Tokyo"],
        "Pop": [90, 70, 60, 85, 60, 45, 70, 40, 32],
    }
)

         Date      City  Pop
0  2020-01-01    London   90
1  2020-01-01  New York   70
2  2020-01-01     Tokyo   60
3  2021-01-01    London   85
4  2021-01-01  New York   60
5  2021-01-01     Tokyo   45
6  2022-01-01    London   70
7  2022-01-01  New York   40
8  2022-01-01     Tokyo   32
df['pop_diff'] = df.groupby(['City'])['Pop'].diff()
df['total'] = df.groupby('Date').Pop.transform('sum')
df['total_shift'] = df.groupby('City')['total'].shift()
df['pc'] = df['pop_diff'] / df['total_shift']

         Date      City  Pop  pop_diff  total  total_shift        pc
0  2020-01-01    London   90       NaN    220          NaN       NaN
1  2020-01-01  New York   70       NaN    220          NaN       NaN
2  2020-01-01     Tokyo   60       NaN    220          NaN       NaN
3  2021-01-01    London   85      -5.0    190        220.0 -0.022727
4  2021-01-01  New York   60     -10.0    190        220.0 -0.045455
5  2021-01-01     Tokyo   45     -15.0    190        220.0 -0.068182
6  2022-01-01    London   70     -15.0    142        190.0 -0.078947
7  2022-01-01  New York   40     -20.0    142        190.0 -0.105263
8  2022-01-01     Tokyo   32     -13.0    142        190.0 -0.068421

Upvotes: 2

Views: 59

Answers (1)

Laurent
Laurent

Reputation: 13518

Here is one way to do it with Pandas assign and pipe:

df = (
    df.assign(total=df.groupby("Date")["Pop"].transform("sum"))
    .pipe(
        lambda df_: df_.assign(
            pc=df_.groupby(["City"])
            .agg({"Pop": "diff", "total": "shift"})
            .pipe(lambda x: x["Pop"] / x["total"])
        )
    )
    .drop(columns="total")
)

Then:

print(df)
# Output
         Date      City  Pop        pc
0  2020-01-01    London   90       NaN
1  2020-01-01  New York   70       NaN
2  2020-01-01     Tokyo   60       NaN
3  2021-01-01    London   85 -0.022727
4  2021-01-01  New York   60 -0.045455
5  2021-01-01     Tokyo   45 -0.068182
6  2022-01-01    London   70 -0.078947
7  2022-01-01  New York   40 -0.105263
8  2022-01-01     Tokyo   32 -0.068421

Upvotes: 1

Related Questions