8-Bit Borges
8-Bit Borges

Reputation: 10043

Pandas - create new column with the sum of last N values of another column with groupby

I have this df:

       round_id          team       opponent  home_dummy   GC   GP  P
0          1.0       Flamengo    Atlético-MG         1.0  1.0  0.0  0
1          4.0       Flamengo         Grêmio         1.0  1.0  1.0  1
2          5.0       Flamengo       Botafogo         1.0  1.0  1.0  1
3          6.0       Flamengo         Santos         0.0  0.0  1.0  3
4          7.0       Flamengo          Bahia         0.0  3.0  5.0  3
5          8.0       Flamengo      Fortaleza         1.0  1.0  2.0  3
6          9.0       Flamengo     Fluminense         0.0  1.0  2.0  3
7         10.0       Flamengo          Ceará         0.0  2.0  0.0  0
8          3.0       Flamengo       Coritiba         0.0  0.0  1.0  3
9         11.0       Flamengo          Goiás         1.0  1.0  2.0  3
10        13.0       Flamengo   Athlético-PR         1.0  1.0  3.0  3
11        14.0       Flamengo          Sport         1.0  0.0  3.0  3
12        15.0       Flamengo          Vasco         0.0  1.0  2.0  3
13        16.0       Flamengo     Bragantino         1.0  1.0  1.0  1
14        17.0       Flamengo    Corinthians         0.0  1.0  5.0  3
15        18.0       Flamengo  Internacional         0.0  2.0  2.0  1
16        19.0       Flamengo      São Paulo         1.0  4.0  1.0  0
17        12.0       Flamengo      Palmeiras         0.0  1.0  1.0  1
18         2.0       Flamengo    Atlético-GO         0.0  3.0  0.0  0
19        20.0       Flamengo    Atlético-MG         0.0  4.0  0.0  0

and I've aded a new column to it, which creates a new column with the sum of last N values of another, like so:

df['Last_5'] = df.P.rolling(5,min_periods=1).sum().shift().fillna(0)

which gives me:

      round_id           team       opponent  home_dummy   GC   GP  P  last_5
0          1.0       Flamengo    Atlético-MG         1.0  1.0  0.0  0  0
1          4.0       Flamengo         Grêmio         1.0  1.0  1.0  1  0
2          5.0       Flamengo       Botafogo         1.0  1.0  1.0  1  1
3          6.0       Flamengo         Santos         0.0  0.0  1.0  3  2
4          7.0       Flamengo          Bahia         0.0  3.0  5.0  3  5
5          8.0       Flamengo      Fortaleza         1.0  1.0  2.0  3  8
6          9.0       Flamengo     Fluminense         0.0  1.0  2.0  3 11
7         10.0       Flamengo          Ceará         0.0  2.0  0.0  0 13
8          3.0       Flamengo       Coritiba         0.0  0.0  1.0  3 12
9         11.0       Flamengo          Goiás         1.0  1.0  2.0  3 12
10        13.0       Flamengo   Athlético-PR         1.0  1.0  3.0  3 12
11        14.0       Flamengo          Sport         1.0  0.0  3.0  3 12
12        15.0       Flamengo          Vasco         0.0  1.0  2.0  3 12
13        16.0       Flamengo     Bragantino         1.0  1.0  1.0  1 15
14        17.0       Flamengo    Corinthians         0.0  1.0  5.0  3 13
15        18.0       Flamengo  Internacional         0.0  2.0  2.0  1 11
16        19.0       Flamengo      São Paulo         1.0  4.0  1.0  0  8
17        12.0       Flamengo      Palmeiras         0.0  1.0  1.0  1  8
18         2.0       Flamengo    Atlético-GO         0.0  3.0  0.0  0  6
19        20.0       Flamengo    Atlético-MG         0.0  4.0  0.0  0  5

But lets say I have many teams in the same dataframe:

       round_id        team      opponent  home_dummy   GC   GP /
0          1.0     Flamengo   Atlético-MG         1.0  1.0  0.0
1          4.0     Flamengo        Grêmio         1.0  1.0  1.0
2          5.0     Flamengo      Botafogo         1.0  1.0  1.0
3          6.0     Flamengo        Santos         0.0  0.0  1.0
4          7.0     Flamengo         Bahia         0.0  3.0  5.0
..         ...          ...           ...         ...  ...  ...
395       15.0  Atlético-GO    Bragantino         1.0  1.0  2.0
396       16.0  Atlético-GO        Santos         0.0  0.0  1.0
397       17.0  Atlético-GO  Athlético-PR         1.0  1.0  1.0
398        9.0  Atlético-GO         Vasco         0.0  1.0  2.0
399       20.0  Atlético-GO   Corinthians         1.0  1.0  1.0

How do I apply the same calculation and achieve the same result per Team, without overlapping last N rows between teams?

Upvotes: 0

Views: 54

Answers (1)

BENY
BENY

Reputation: 323396

Adding the groupby

df['Last_5'] = df.groupby('team').P.apply(lambda x : x.rolling(5,min_periods=1).sum().shift().fillna(0))

Upvotes: 1

Related Questions