Reputation: 3010
My dataframe is as follows:
player_id season_id game_id points mean_to_date
200 21999 29900007 10 0
200 21999 29900023 20 0
200 21200 29900042 10 0
200 21200 29900059 20 0
200 21200 29900081 30 0
300 21999 29900089 10 0
300 22111 29900108 10 0
300 22111 29900118 20 0
300 22111 29900143 30 0
I split it into groups with:
grouped = frame.groupby(['player_id', 'season_id'])
I have the following function that I want to apply to each group:
def previous_mean(player_season):
avgs = {}
i = 0
for idx, game in player_season.iterrows():
gamenum = i + 1
if gamenum == 1:
avgs[1] = 0
elif gamenum == 2:
avgs[2] = player_season.at[idx-1, 'dk_points']
elif gamenum > 2:
logging.debug("gamenum is {0}".format(gamenum))
pts = player_season.at[idx-1, 'points']
avgs[gamenum] = (avgs.get(i)*(i-1) + pts)/i
i+= 1
return avgs.values()
Calling
grouped.apply(previous_mean)
results in the following:
player_id season_id
200 21200 [0, 10, 15.0]
21999 [0, 10]
300 21999 [0]
22111 [0, 10, 15.0]
How do I make the results of the apply operation the values of the "mean_to_date" column? That is, the mean_to_date for player 200, season 21999 would be 0 and 10, then for player 200, season 21200 it would be 0, 10, and 15, and so forth. Note that the mean_to_date value represents the mean prior to the game, so before the 1st game it is zero, and before the second game it is the total from the first game.
Also, the "previous_mean" function is ugly and there is probably a more efficient way to accomplish the same end, but I couldn't figure it.
Upvotes: 3
Views: 1981
Reputation: 863176
IIUC you can use expanding_mean
, shift data by shift
to 1
, fill NaN
to 0
by fillna
and return column mean_to_date
:
print frame
# player_id season_id game_id points mean_to_date
#0 200 21999 29900007 10 0
#1 200 21999 29900023 20 0
#2 200 21200 29900042 10 0
#3 200 21200 29900059 20 0
#4 200 21200 29900081 30 0
#5 300 21999 29900089 10 0
#6 300 22111 29900108 10 0
#7 300 22111 29900118 20 0
#8 300 22111 29900143 30 0
frame['mean_to_date'] = frame.groupby(['player_id','season_id']).apply(
lambda x: pd.expanding_mean(x['points'], 1).shift(1)
.fillna(0))
.reset_index(drop=True)
print frame
# player_id season_id game_id points mean_to_date
#0 200 21999 29900007 10 0
#1 200 21999 29900023 20 10
#2 200 21200 29900042 10 0
#3 200 21200 29900059 20 10
#4 200 21200 29900081 30 15
#5 300 21999 29900089 10 0
#6 300 22111 29900108 10 0
#7 300 22111 29900118 20 10
#8 300 22111 29900143 30 15
Upvotes: 2