jon
jon

Reputation: 359

Pandas using map or apply to make a new column from adjustments using a dictionary

I have data from a sporting event and the knowledge is that there is a bias at each home arena that I want to make adjustments for. I have already created a dictionary where the arena is the key and the value is the adjustment I want to make.

So for each row, I want to take the home team, get the adjustment, and then subtract that from the distance column. I have the following code but I cannot seem to get it working.

#Making the dictionary, this is working properly
teams = df.home_team.unique().tolist()
adj_shot_dict = {}
for team in teams:
    df_temp = df[df.home_team == team]
    average = round(df_temp.event_distance.mean(),2)
    adj_shot_dict[team] = average

def make_adjustment(df):
    team = df.home_team
    distance = df.event_distance
    adj_dist = distance - adj_shot_dict[team]
    return adj_dist

df['adj_dist'] = df['event_distance'].apply(make_adjustment)

Upvotes: 0

Views: 47

Answers (1)

Corralien
Corralien

Reputation: 120391

IIUC, you already have the dict and you want simply subtract adj_shot_dict to event_distance column:

df['adj_dist'] = df['event_distance'] - df['home_team'].map(adj_shot_dict)

Old answer

Group by home_team, compute the average of event_distance then subtract the result to event_distance:

df['adj_dist'] = df['event_distance'] \
                 - df.groupby('home_team')['event_distance'] \
                     .transform('mean').round(2)

# OR

df['adj_dist'] = df.groupby('home_team')['event_distance'] \
                   .apply(lambda x: x - x.mean().round(2))

Performance

>>> len(df)
60000

>>> df.sample(5)
  home_team  event_distance
5     team3              60
4     team2              50
1     team2              20
1     team2              20
0     team1              10
def loop():
    teams = df.home_team.unique().tolist()
    adj_shot_dict = {}
    for team in teams:
        df_temp = df[df.home_team == team]
        average = round(df_temp.event_distance.mean(),2)
        adj_shot_dict[team] = average

def loop2():
    df.groupby('home_team')['event_distance'].transform('mean').round(2)
>>> %timeit loop()
13.5 ms ± 194 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> %timeit loop2()
3.62 ms ± 167 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# Total process
>>> %timeit df['event_distance'] - df.groupby('home_team')['event_distance'].transform('mean').round(2)
3.7 ms ± 21.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Upvotes: 2

Related Questions