Pandas Group By Multiple Colums and Calculate Standard Deviation

Question

I have a pandas dataframe that contains statistics of basketball players from the NBA from multiple seasons and teams. It looks like this:

Year         Team          Player            PTS/G 
2018         Lakers        Lebron James      27.6
2018         Lakers        Kyle Kuzma        10.3
2019         Rockets       James Harden      25.5
2019         Rockets       Russel Westbrook  23.2

I want to create a new column called 'PTS Dev' that is the standard deviation of PTS/G for each team and year. Then, I plan on analyzing where a player is according to that deviation. This is my attempt to calculate that column:

final_data['PTS Dev'] = final_data.groupby('Team', 'Year')['PTS/G'].std()

henrywongkk · Accepted Answer

Use groupby with transform

final_data['PTS Dev'] = final_data.groupby(['Team', 'Year'])['PTS/G'].transform('std')
final_data
Out[9]: 
   Year     Team            Player  PTS/G    PTS Dev
0  2018   Lakers      Lebron James   27.6  12.232947
1  2018   Lakers        Kyle Kuzma   10.3  12.232947
2  2019  Rockets      James Harden   25.5   1.626346
3  2019  Rockets  Russel Westbrook   23.2   1.626346

Pandas Group By Multiple Colums and Calculate Standard Deviation

Answers (1)

Related Questions