Reputation: 81

Mean of the certain rows of columns

Update: Original question here: I need to get mean of total goals from H2H matches. Input:

Home   Away   Home_goals Away_goals 
------------------------------------
Team 1 Team 2 2          1          
Team 3 Team 4 3          5
Team 2 Team 1 5          3
Team 4 Team 3 1          5

Output:

Home   Away   Home_goals Away_goals Mean
------------------------------------------------------
Team 1 Team 2 2          1          5.5 ((2+1+5+3)/2) 
Team 3 Team 4 3          5          7 ((3+5+1+5)/2)
Team 2 Team 1 5          3          5.5 ((2+1+5+3)/2) 
Team 4 Team 3 1          5          7 ((3+5+1+5)/2)

Code below work fine but I came across another question. What if I have n matches between team 1 and team 2 and I want to calculate mean based on n-1 matches (without the last one). Can I just somehow change the code below?

a = np.sort(df[["Home", "Away"]], axis=1)
df['Mean'] = (pd.DataFrame(a, index=df.index)
                .assign(sum = df[['Home_goals','Away_goals']].sum(axis=1))
                .groupby([0,1])['sum']
                .transform('mean'))

Thanks

Upvotes: 0

Answers (2)

jezrael

Reputation: 863801

You can add iloc to lambda function in GroupBy.transform:

a = np.sort(df[["Home", "Away"]], axis=1)
df['Mean'] = (pd.DataFrame(a, index=df.index)
                .assign(sum = df[['Home_goals','Away_goals']].sum(axis=1))
                .groupby([0,1])['sum']
                .transform(lambda x: x.iloc[:-1].mean()))

Upvotes: 1

Roy2012

Reputation: 12543

The following produces a mean for every two-team combination while dropping the last game.

I added a couple of games to demonstrate that:

Home   Away   Home_goals Away_goals 
Team-1 Team-2 2          1          
Team-3 Team-4 3          5
Team-2 Team-1 5          3
Team-4 Team-3 1          5
Team-2 Team-1 10        10
Team-4 Team-3 10        10

Code:

a = np.sort(df[["Home", "Away"]], axis=1)
df = pd.concat([df, pd.DataFrame(a, columns=["team1", "team2"])], axis="columns")
df["sum"] = df[["Home_goals", "Away_goals"]].sum(axis="columns")

drop_last = df.groupby(["team1", "team2"]).apply(lambda x: x.iloc[:-1]).drop(["team1", "team2"], axis="columns").reset_index()
drop_last["mean"] = drop_last.groupby(["team1", "team2"])["sum"].transform("mean")
drop_last = drop_last[["team1", "team2", "mean"]]
drop_last = drop_last.drop_duplicates()

res = pd.merge(df, drop_last, on = ["team1", "team2"])
print(res)

result:

     Home    Away  Home_goals  Away_goals   team1   team2  sum  mean
0  Team-1  Team-2           2           1  Team-1  Team-2    3   5.5
1  Team-2  Team-1           5           3  Team-1  Team-2    8   5.5
2  Team-2  Team-1          10          10  Team-1  Team-2   20   5.5
3  Team-3  Team-4           3           5  Team-3  Team-4    8   7.0
4  Team-4  Team-3           1           5  Team-3  Team-4    6   7.0
5  Team-4  Team-3          10          10  Team-3  Team-4   20   7.0

Upvotes: 0

Mean of the certain rows of columns

Answers (2)

Related Questions