GNMO11
GNMO11

Reputation: 2259

Pandas Rolling Mean Depending on Row Value

I have the following dataframe in pandas:

Visitor    Home     Visitor_rtg    Home_rtg
 Sixers    Celtics     83.4         100.7
 Thunder   Warriors    97.7         105.6
 Bucks     Hornets     110.2        109.2
 Pelicans  Sixers      88.1         89.2
 Hornets   Wizards     90.1         102.3

What I would like to do is keep the rolling mean of rtg for each team, regardless of if they are home or visitor. So for example in the fourth row the sixers home_rtg would be 86.3 because (83.4+89.2) / 2. And the final row the Hornets would be 99.65 because (109.2+90.1)/2 I have found df.rolling() but cant figure out how to get this to only calculate per team. The ordering also matters because I want to track the rating progression throughout the season.

Expected output:

Visitor    Home     Visitor_rtg    Home_rtg
 Sixers    Celtics     83.4         100.7
 Thunder   Warriors    97.7         105.6
 Bucks     Hornets     110.2        109.2
 Pelicans  Sixers      88.1         *86.3
 Hornets   Wizards     *99.65        102.3

Upvotes: 1

Views: 412

Answers (3)

rafaelc
rafaelc

Reputation: 59304

stack+concat+groupby+mean

IIUC, you can stack your dataframe, and groupby+mean the values. No need for rolling

f = df[['Visitor', 'Home']].stack().reset_index(drop=True)
s = df[['Visitor_rtg', 'Home_rtg']].stack().reset_index(drop=True)
pd.concat([f, s], axis=1).groupby(0)[1].mean()

Outputs

Bucks       110.2
Celtics     100.7
Hornet      109.2
Pelicans     88.1
Sixers       86.3
Thunder      97.7
Warriors    105.6
Name: 1, dtype: float64

Edited answer

To get your desired output, you can use expanding (not rolling) to get the mean and transform to broadcast back to each original row. In the end, just reshape underlying values of your dataframe to get final output

df2 = pd.concat([f, s], axis=1)
df2[1] = df2.groupby(0).transform(lambda s: s.expanding().mean())
pd.DataFrame(df2.values.reshape(-1,4), columns=df.columns)


    Visitor     Home    Visitor_rtg Home_rtg
0   Sixers      83.4    Celtics     100.7
1   Thunder     97.7    Warriors    105.6
2   Bucks       110.2   Hornets     109.2
3   Pelicans    88.1    Sixers      86.3
4   Hornets     99.65   Wizards     102.3

Upvotes: 1

user3483203
user3483203

Reputation: 51185

You can use reshape on the underlying numpy arrays for an efficient solution.


Step 1
There is quite a bit of reshaping involved, and finally putting the data into an empty numpy array.

home = df.iloc[:, [1,3]]
away = df.iloc[:, [0,2]]
total = np.empty((home.shape[0] * 2, home.shape[1]), dtype=object)
total[::2] = away
total[1::2] = home

Step 2
Now that we have the data formatted, we can use expanding.mean and a bit more reshaping

d = pd.DataFrame({'team': total[:, 0], 'score': total[:, 1].astype(float)}) 
out = d.groupby('team').score.expanding().mean().reset_index().sort_values(by='level_1') 

pd.DataFrame(
    out[['team', 'score']].values.reshape(-1, 4)[:, [0,2,1,3]],
    columns=df.columns
)

    Visitor      Home Visitor_rtg Home_rtg
0    Sixers   Celtics        83.4    100.7
1   Thunder  Warriors        97.7    105.6
2     Bucks   Hornets       110.2    109.2
3  Pelicans    Sixers        88.1     86.3
4   Hornets   Wizards       99.65    102.3

Upvotes: 0

BallpointBen
BallpointBen

Reputation: 13999

A not particularly pleasant solution:

df = pd.DataFrame([['Sixers', 'Celtics', 83.4, 100.7], 
                   ['Pelicans', 'Sixers', 88.1, 89.2]], 
                  columns=['Visitor', 'Home', 'Visitor_rtg', 'Home_rtg'])
melted = df.melt(['Visitor', 'Home'])
pd.concat([melted.loc[melted['variable'] == 'Visitor_rtg', ['Visitor', 'value']].rename(columns={'Visitor': 'Team'}), 
            melted.loc[melted['variable'] == 'Home_rtg', ['Home', 'value']].rename(columns={'Home': 'Team'})], 
           axis=0, 
           ignore_index=True
      ).groupby('Team').mean()

Upvotes: 0

Related Questions