Entering df.groupby....max() result into a new column. Pandas

Question

My data is related to "Cricket", sports game (like Baseball). It has 20 overs for each inning max and each over has approx 6 balls.

data:

        season  match_id    inning  sum_total_runs  sum_total_wickets   over/ball   innings_score
32      2008    60          1       61              0                   5.1         0
33      2008    60          1       61              1                   5.2         0
34      2008    60          1       61              1                   5.3         0
35      2008    60          1       61              1                   5.4         0
36      2008    60          1       61              1                   5.5         0
...     ...     ...         ...     ...             ...                 ...         ...
179073  2019    11415       2       152             5                   19.2        0
179074  2019    11415       2       154             5                   19.3        0
179075  2019    11415       2       155             6                   19.4        0
179076  2019    11415       2       157             6                   19.5        0 
179077  2019    11415       2       157             7                   19.6        0

111972 rows × 7 columns

innings_score is new column created by me (given default value 0). I want to update it. The values that I want to enter in it are the results of df.groupby below.

In[]:
df.groupby(['season', 'match_id', 'inning'])['sum_total_runs'].max()

Out[]:
season  match_id  inning
2008    60        1         222
                  2          82
        61        1         240
                  2         207
        62        1         129
                           ... 
2019    11413     2         170
        11414     1         155
                  2         162
        11415     1         152
                  2         157
Name: sum_total_runs, Length: 1276, dtype: int64

I want innings_score to be like:

        season  match_id    inning  sum_total_runs  sum_total_wickets   over/ball   innings_score
32      2008    60          1       61              0                   5.1         222
33      2008    60          1       61              1                   5.2         222
34      2008    60          1       61              1                   5.3         222
35      2008    60          1       61              1                   5.4         222
36      2008    60          1       61              1                   5.5         222
...     ...     ...         ...     ...             ...                 ...         ...
179073  2019    11415       2       152             5                   19.2        157
179074  2019    11415       2       154             5                   19.3        157
179075  2019    11415       2       155             6                   19.4        157
179076  2019    11415       2       157             6                   19.5        157
179077  2019    11415       2       157             7                   19.6        157

111972 rows × 7 columns

nikeros · Accepted Answer

I would use assign. Starting from a simple example:

import pandas as pd

dt = pd.DataFrame({"name1":["A", "A", "B", "B", "C", "C"], "name2":["C", "C", "C", "D", "D", "D"], "value":[1, 2, 3, 4, 5, 6]})
grouping_variables = ["name1", "name2"]
dt = dt.set_index(grouping_variables)
dt = dt.assign(new_column=dt.groupby(grouping_variables)["value"].max())

As you can see, you set your grouping_variables as indeces before running the assignment.

You can always reset the index at the end if you don't want to keep the grouping_variables indexed dataframe:

dt.reset_index()

Entering df.groupby....max() result into a new column. Pandas

Answers (2)

Related Questions