Reputation: 2259
I have the following dataframe in pandas:
Visitor Home Visitor_rtg Home_rtg
Sixers Celtics 83.4 100.7
Thunder Warriors 97.7 105.6
Bucks Hornets 110.2 109.2
Pelicans Sixers 88.1 89.2
Hornets Wizards 90.1 102.3
What I would like to do is keep the rolling mean of rtg for each team, regardless of if they are home or visitor. So for example in the fourth row the sixers home_rtg would be 86.3
because (83.4+89.2) / 2
. And the final row the Hornets would be 99.65
because (109.2+90.1)/2
I have found df.rolling()
but cant figure out how to get this to only calculate per team. The ordering also matters because I want to track the rating progression throughout the season.
Expected output:
Visitor Home Visitor_rtg Home_rtg
Sixers Celtics 83.4 100.7
Thunder Warriors 97.7 105.6
Bucks Hornets 110.2 109.2
Pelicans Sixers 88.1 *86.3
Hornets Wizards *99.65 102.3
Upvotes: 1
Views: 412
Reputation: 59304
stack
+concat
+groupby
+mean
IIUC, you can stack
your dataframe, and groupby
+mean
the values. No need for rolling
f = df[['Visitor', 'Home']].stack().reset_index(drop=True)
s = df[['Visitor_rtg', 'Home_rtg']].stack().reset_index(drop=True)
pd.concat([f, s], axis=1).groupby(0)[1].mean()
Outputs
Bucks 110.2
Celtics 100.7
Hornet 109.2
Pelicans 88.1
Sixers 86.3
Thunder 97.7
Warriors 105.6
Name: 1, dtype: float64
To get your desired output, you can use expanding
(not rolling) to get the mean and transform
to broadcast back to each original row. In the end, just reshape
underlying values of your dataframe
to get final output
df2 = pd.concat([f, s], axis=1)
df2[1] = df2.groupby(0).transform(lambda s: s.expanding().mean())
pd.DataFrame(df2.values.reshape(-1,4), columns=df.columns)
Visitor Home Visitor_rtg Home_rtg
0 Sixers 83.4 Celtics 100.7
1 Thunder 97.7 Warriors 105.6
2 Bucks 110.2 Hornets 109.2
3 Pelicans 88.1 Sixers 86.3
4 Hornets 99.65 Wizards 102.3
Upvotes: 1
Reputation: 51185
You can use reshape
on the underlying numpy
arrays for an efficient solution.
Step 1
There is quite a bit of reshaping involved, and finally putting the data into an empty
numpy array.
home = df.iloc[:, [1,3]]
away = df.iloc[:, [0,2]]
total = np.empty((home.shape[0] * 2, home.shape[1]), dtype=object)
total[::2] = away
total[1::2] = home
Step 2
Now that we have the data formatted, we can use expanding.mean
and a bit more reshaping
d = pd.DataFrame({'team': total[:, 0], 'score': total[:, 1].astype(float)})
out = d.groupby('team').score.expanding().mean().reset_index().sort_values(by='level_1')
pd.DataFrame(
out[['team', 'score']].values.reshape(-1, 4)[:, [0,2,1,3]],
columns=df.columns
)
Visitor Home Visitor_rtg Home_rtg
0 Sixers Celtics 83.4 100.7
1 Thunder Warriors 97.7 105.6
2 Bucks Hornets 110.2 109.2
3 Pelicans Sixers 88.1 86.3
4 Hornets Wizards 99.65 102.3
Upvotes: 0
Reputation: 13999
A not particularly pleasant solution:
df = pd.DataFrame([['Sixers', 'Celtics', 83.4, 100.7],
['Pelicans', 'Sixers', 88.1, 89.2]],
columns=['Visitor', 'Home', 'Visitor_rtg', 'Home_rtg'])
melted = df.melt(['Visitor', 'Home'])
pd.concat([melted.loc[melted['variable'] == 'Visitor_rtg', ['Visitor', 'value']].rename(columns={'Visitor': 'Team'}),
melted.loc[melted['variable'] == 'Home_rtg', ['Home', 'value']].rename(columns={'Home': 'Team'})],
axis=0,
ignore_index=True
).groupby('Team').mean()
Upvotes: 0