Reputation: 79

Pandas groupby on multi-columns and broadcast the result to the original dataframe

I have a pandas dataframe of the form:

      bowler    inning  wickets Total_wickets   matches balls
0   SL Malinga     1      69         143          44    4078
1   SL Malinga     2      74         143          54    4735
2   A Mishra       1      48         124          50    3908
3   A Mishra       2      76         124          62    4930
4   DJ Bravo       1      61         122          48    3887

I would like to groupby this df on 'bowler' and 'inning' and perform some computation on 'wickets' and 'balls' column, then broadcast it to the same df as new column. One of the ways, I attempted was to use transform such as:

df_bowler['strike rate'] = df_bowler.groupby(['bowler','inning']).transform(lambda x : x['balls']/x['wickets'])

This results in a keyError exception :

KeyError: ('balls', 'occurred at index wickets')

I accomplished what I needed by using apply followed by merge such as:

df_strRate = df_bowler.groupby(['bowler','inning']).apply(lambda x:x['balls']/x['wickets']).reset_index(level=2,drop=True).reset_index(name='strike rate')
df_bowler = df_bowler.merge(df_strRate,on=['bowler','inning'])

However, this seems like a roundabout way of doing it. I would like to know why transform fails in this case. Any pointers?

Thanks.

Upvotes: 1

Answers (3)

jezrael

Reputation: 862481

There is problem GroupBy.transform function if not defined column in [] first working with each Series separately - so cannot working with 2 columns together, impossible divide them:

def f(x):
    print (x)

2    48
Name: wickets, dtype: int64
2    124
Name: Total_wickets, dtype: int64
2    50
Name: matches, dtype: int64
2    3908
Name: balls, dtype: int64

df = df_bowler.groupby(['bowler','inning']).transform(f)

If define column in []:

def f(x):
    print (x)

2    3908
Name: (A Mishra, 1), dtype: int64
3    4930
Name: (A Mishra, 2), dtype: int64
4    3887
Name: (DJ Bravo, 1), dtype: int64
0    4078
Name: (SL Malinga, 1), dtype: int64
1    4735
Name: (SL Malinga, 2), dtype: int64


df = df_bowler.groupby(['bowler','inning'])['balls'].transform(f)

Same way working DataFrameGroupBy.agg function.

Conclusion:

If want working with data by groups need GroupBy.apply:

def f(x):
    print (x)

     bowler  inning  wickets  Total_wickets  matches  balls
2  A Mishra       1       48            124       50   3908
     bowler  inning  wickets  Total_wickets  matches  balls
2  A Mishra       1       48            124       50   3908
     bowler  inning  wickets  Total_wickets  matches  balls
3  A Mishra       2       76            124       62   4930
     bowler  inning  wickets  Total_wickets  matches  balls


df = df_bowler.groupby(['bowler','inning']).apply(f)

Upvotes: 1

Satya

Reputation: 5907

Edit:

Try the below method using apply()

df = df.merge(df.groupby(['bowler', 'inning']).apply(lambda x : sum(x['balls']/x['wickets')]).reset_index(), on=['bowler', 'inning']).rename(columns={0:'Mycolumn'})
#If you don't want a rename  then new resulted column will be named as 0. As per your wish, use it/discard rename part.

Or, if you want simple column operations, I prefer Cory's 2nd Option.

Upvotes: 0

Cory Madden

Reputation: 5193

Your transform is failing because you're applying it along the wrong axis and you need to use an aggregation such as sum() first. Check this out:

In [83]: df.groupby(['bowler', 'inning']).sum().transform(lambda x : x['balls'].astype(float)/x['wickets'].astype(float), axis=1)
Out[83]: 
bowler      inning
A Mishra    1         81.416667
            2         64.868421
DJ Bravo    1         63.721311
SL Malinga  1         59.101449
            2         63.986486
dtype: float64

But you may as well do:

In [88]: df['strike_rate'] = df.balls / df.wickets
In [89]: df
Out[89]: 
       bowler  inning  wickets  Total_wickets  matches  balls  strike_rate
0  SL Malinga       1       69            143       44   4078    59.101449
1  SL Malinga       2       74            143       54   4735    63.986486
2    A Mishra       1       48            124       50   3908    81.416667
3    A Mishra       2       76            124       62   4930    64.868421
4    DJ Bravo       1       61            122       48   3887    63.721311

Upvotes: 2

Pandas groupby on multi-columns and broadcast the result to the original dataframe

Answers (3)

Edit:

Related Questions