Reputation: 79
I have a pandas dataframe of the form:
bowler inning wickets Total_wickets matches balls
0 SL Malinga 1 69 143 44 4078
1 SL Malinga 2 74 143 54 4735
2 A Mishra 1 48 124 50 3908
3 A Mishra 2 76 124 62 4930
4 DJ Bravo 1 61 122 48 3887
I would like to groupby this df on 'bowler' and 'inning' and perform some computation on 'wickets' and 'balls' column, then broadcast it to the same df as new column. One of the ways, I attempted was to use transform such as:
df_bowler['strike rate'] = df_bowler.groupby(['bowler','inning']).transform(lambda x : x['balls']/x['wickets'])
This results in a keyError exception :
KeyError: ('balls', 'occurred at index wickets')
I accomplished what I needed by using apply followed by merge such as:
df_strRate = df_bowler.groupby(['bowler','inning']).apply(lambda x:x['balls']/x['wickets']).reset_index(level=2,drop=True).reset_index(name='strike rate')
df_bowler = df_bowler.merge(df_strRate,on=['bowler','inning'])
However, this seems like a roundabout way of doing it. I would like to know why transform fails in this case. Any pointers?
Thanks.
Upvotes: 1
Views: 1771
Reputation: 862481
There is problem GroupBy.transform
function if not defined column in []
first working with each Series
separately - so cannot working with 2 columns together, impossible divide them:
def f(x):
print (x)
2 48
Name: wickets, dtype: int64
2 124
Name: Total_wickets, dtype: int64
2 50
Name: matches, dtype: int64
2 3908
Name: balls, dtype: int64
df = df_bowler.groupby(['bowler','inning']).transform(f)
If define column in []
:
def f(x):
print (x)
2 3908
Name: (A Mishra, 1), dtype: int64
3 4930
Name: (A Mishra, 2), dtype: int64
4 3887
Name: (DJ Bravo, 1), dtype: int64
0 4078
Name: (SL Malinga, 1), dtype: int64
1 4735
Name: (SL Malinga, 2), dtype: int64
df = df_bowler.groupby(['bowler','inning'])['balls'].transform(f)
Same way working DataFrameGroupBy.agg
function.
Conclusion:
If want working with data by groups need GroupBy.apply
:
def f(x):
print (x)
bowler inning wickets Total_wickets matches balls
2 A Mishra 1 48 124 50 3908
bowler inning wickets Total_wickets matches balls
2 A Mishra 1 48 124 50 3908
bowler inning wickets Total_wickets matches balls
3 A Mishra 2 76 124 62 4930
bowler inning wickets Total_wickets matches balls
df = df_bowler.groupby(['bowler','inning']).apply(f)
Upvotes: 1
Reputation: 5907
Try the below method using apply()
df = df.merge(df.groupby(['bowler', 'inning']).apply(lambda x : sum(x['balls']/x['wickets')]).reset_index(), on=['bowler', 'inning']).rename(columns={0:'Mycolumn'})
#If you don't want a rename then new resulted column will be named as 0. As per your wish, use it/discard rename part.
Or, if you want simple column operations, I prefer Cory's 2nd Option.
Upvotes: 0
Reputation: 5193
Your transform is failing because you're applying it along the wrong axis and you need to use an aggregation such as sum()
first. Check this out:
In [83]: df.groupby(['bowler', 'inning']).sum().transform(lambda x : x['balls'].astype(float)/x['wickets'].astype(float), axis=1)
Out[83]:
bowler inning
A Mishra 1 81.416667
2 64.868421
DJ Bravo 1 63.721311
SL Malinga 1 59.101449
2 63.986486
dtype: float64
But you may as well do:
In [88]: df['strike_rate'] = df.balls / df.wickets
In [89]: df
Out[89]:
bowler inning wickets Total_wickets matches balls strike_rate
0 SL Malinga 1 69 143 44 4078 59.101449
1 SL Malinga 2 74 143 54 4735 63.986486
2 A Mishra 1 48 124 50 3908 81.416667
3 A Mishra 2 76 124 62 4930 64.868421
4 DJ Bravo 1 61 122 48 3887 63.721311
Upvotes: 2