Reputation: 151
My data is related to "Cricket"
, sports game (like Baseball). It has 20 overs for each inning max
and each over has approx 6 balls
.
data:
season match_id inning sum_total_runs sum_total_wickets over/ball innings_score
32 2008 60 1 61 0 5.1 0
33 2008 60 1 61 1 5.2 0
34 2008 60 1 61 1 5.3 0
35 2008 60 1 61 1 5.4 0
36 2008 60 1 61 1 5.5 0
... ... ... ... ... ... ... ...
179073 2019 11415 2 152 5 19.2 0
179074 2019 11415 2 154 5 19.3 0
179075 2019 11415 2 155 6 19.4 0
179076 2019 11415 2 157 6 19.5 0
179077 2019 11415 2 157 7 19.6 0
111972 rows × 7 columns
innings_score
is new column created by me (given default value 0). I want to update it.
The values that I want to enter in it are the results of df.groupby
below.
In[]:
df.groupby(['season', 'match_id', 'inning'])['sum_total_runs'].max()
Out[]:
season match_id inning
2008 60 1 222
2 82
61 1 240
2 207
62 1 129
...
2019 11413 2 170
11414 1 155
2 162
11415 1 152
2 157
Name: sum_total_runs, Length: 1276, dtype: int64
I want innings_score
to be like:
season match_id inning sum_total_runs sum_total_wickets over/ball innings_score
32 2008 60 1 61 0 5.1 222
33 2008 60 1 61 1 5.2 222
34 2008 60 1 61 1 5.3 222
35 2008 60 1 61 1 5.4 222
36 2008 60 1 61 1 5.5 222
... ... ... ... ... ... ... ...
179073 2019 11415 2 152 5 19.2 157
179074 2019 11415 2 154 5 19.3 157
179075 2019 11415 2 155 6 19.4 157
179076 2019 11415 2 157 6 19.5 157
179077 2019 11415 2 157 7 19.6 157
111972 rows × 7 columns
Upvotes: 2
Views: 61
Reputation: 96
One way is to set those 3 columns as index and assign the groupby result as a new column and reset index after that.
While those columns are index, the grouby result and the dataframe both have similar index, so pandas will automatically match and insert the correct rows in the correct positions. Then reset index will turn them back into normal columns.
Something like this:
In [46]: df
Out[46]:
season match_id inning sum_total_runs sum_total_wickets over/ball
0 2008 60 1 61 0 5.1
1 2008 60 1 61 1 5.2
2 2008 60 1 61 1 5.3
3 2008 60 1 61 1 5.4
4 2008 60 1 61 1 5.5
5 2019 11415 2 152 5 19.2
6 2019 11415 2 154 5 19.3
7 2019 11415 2 155 6 19.4
8 2019 11415 2 157 6 19.5
9 2019 11415 2 157 7 19.6
In [47]: df.set_index(['season', 'match_id', 'inning']).assign(innings_score=df.groupby(['season', 'match_id', 'inning'])['sum_total_runs'].max()).reset_index()
Out[47]:
season match_id inning sum_total_runs sum_total_wickets over/ball innings_score
0 2008 60 1 61 0 5.1 61
1 2008 60 1 61 1 5.2 61
2 2008 60 1 61 1 5.3 61
3 2008 60 1 61 1 5.4 61
4 2008 60 1 61 1 5.5 61
5 2019 11415 2 152 5 19.2 157
6 2019 11415 2 154 5 19.3 157
7 2019 11415 2 155 6 19.4 157
8 2019 11415 2 157 6 19.5 157
9 2019 11415 2 157 7 19.6 157
Upvotes: 1
Reputation: 3379
I would use assign
. Starting from a simple example:
import pandas as pd
dt = pd.DataFrame({"name1":["A", "A", "B", "B", "C", "C"], "name2":["C", "C", "C", "D", "D", "D"], "value":[1, 2, 3, 4, 5, 6]})
grouping_variables = ["name1", "name2"]
dt = dt.set_index(grouping_variables)
dt = dt.assign(new_column=dt.groupby(grouping_variables)["value"].max())
As you can see, you set your grouping_variables
as indeces before running the assignment.
You can always reset the index at the end if you don't want to keep the grouping_variables
indexed dataframe:
dt.reset_index()
Upvotes: 2