Eric Truett
Eric Truett

Reputation: 3010

Pandas: apply returns list

My dataframe is as follows:

player_id  season_id   game_id  points  mean_to_date  
200      21999  29900007         10             0     
200      21999  29900023         20             0     
200      21200  29900042         10             0     
200      21200  29900059         20             0     
200      21200  29900081         30             0     
300      21999  29900089         10             0     
300      22111  29900108         10             0     
300      22111  29900118         20             0     
300      22111  29900143         30             0

I split it into groups with:

grouped = frame.groupby(['player_id', 'season_id'])     

I have the following function that I want to apply to each group:

def previous_mean(player_season):   
    avgs = {}
    i = 0
    for idx, game in player_season.iterrows():
        gamenum = i + 1

        if gamenum == 1:
            avgs[1] = 0

        elif gamenum == 2:
            avgs[2] = player_season.at[idx-1, 'dk_points']
        
        elif gamenum > 2:
            logging.debug("gamenum is {0}".format(gamenum))
            pts = player_season.at[idx-1, 'points']
            avgs[gamenum] = (avgs.get(i)*(i-1) + pts)/i

        i+= 1
    
    return avgs.values()

Calling

grouped.apply(previous_mean)

results in the following:

player_id  season_id
200        21200        [0, 10, 15.0]
           21999              [0, 10]
300        21999                  [0]
           22111        [0, 10, 15.0]

How do I make the results of the apply operation the values of the "mean_to_date" column? That is, the mean_to_date for player 200, season 21999 would be 0 and 10, then for player 200, season 21200 it would be 0, 10, and 15, and so forth. Note that the mean_to_date value represents the mean prior to the game, so before the 1st game it is zero, and before the second game it is the total from the first game.

Also, the "previous_mean" function is ugly and there is probably a more efficient way to accomplish the same end, but I couldn't figure it.

Upvotes: 3

Views: 1981

Answers (1)

jezrael
jezrael

Reputation: 863176

IIUC you can use expanding_mean, shift data by shift to 1, fill NaN to 0 by fillna and return column mean_to_date:

print frame
#   player_id  season_id   game_id  points  mean_to_date
#0        200      21999  29900007      10             0
#1        200      21999  29900023      20             0
#2        200      21200  29900042      10             0
#3        200      21200  29900059      20             0
#4        200      21200  29900081      30             0
#5        300      21999  29900089      10             0
#6        300      22111  29900108      10             0
#7        300      22111  29900118      20             0
#8        300      22111  29900143      30             0


frame['mean_to_date'] = frame.groupby(['player_id','season_id']).apply(
          lambda x: pd.expanding_mean(x['points'], 1).shift(1)
                                                     .fillna(0))
                                                     .reset_index(drop=True)
print frame

#   player_id  season_id   game_id  points  mean_to_date
#0        200      21999  29900007      10             0
#1        200      21999  29900023      20            10
#2        200      21200  29900042      10             0
#3        200      21200  29900059      20            10
#4        200      21200  29900081      30            15
#5        300      21999  29900089      10             0
#6        300      22111  29900108      10             0
#7        300      22111  29900118      20            10
#8        300      22111  29900143      30            15

Upvotes: 2

Related Questions