Reputation: 81
Update: Original question here: I need to get mean of total goals from H2H matches. Input:
Home Away Home_goals Away_goals
------------------------------------
Team 1 Team 2 2 1
Team 3 Team 4 3 5
Team 2 Team 1 5 3
Team 4 Team 3 1 5
Output:
Home Away Home_goals Away_goals Mean
------------------------------------------------------
Team 1 Team 2 2 1 5.5 ((2+1+5+3)/2)
Team 3 Team 4 3 5 7 ((3+5+1+5)/2)
Team 2 Team 1 5 3 5.5 ((2+1+5+3)/2)
Team 4 Team 3 1 5 7 ((3+5+1+5)/2)
Code below work fine but I came across another question. What if I have n matches between team 1 and team 2 and I want to calculate mean based on n-1 matches (without the last one). Can I just somehow change the code below?
a = np.sort(df[["Home", "Away"]], axis=1)
df['Mean'] = (pd.DataFrame(a, index=df.index)
.assign(sum = df[['Home_goals','Away_goals']].sum(axis=1))
.groupby([0,1])['sum']
.transform('mean'))
Thanks
Upvotes: 0
Views: 73
Reputation: 863801
You can add iloc
to lambda function in GroupBy.transform
:
a = np.sort(df[["Home", "Away"]], axis=1)
df['Mean'] = (pd.DataFrame(a, index=df.index)
.assign(sum = df[['Home_goals','Away_goals']].sum(axis=1))
.groupby([0,1])['sum']
.transform(lambda x: x.iloc[:-1].mean()))
Upvotes: 1
Reputation: 12543
The following produces a mean for every two-team combination while dropping the last game.
I added a couple of games to demonstrate that:
Home Away Home_goals Away_goals
Team-1 Team-2 2 1
Team-3 Team-4 3 5
Team-2 Team-1 5 3
Team-4 Team-3 1 5
Team-2 Team-1 10 10
Team-4 Team-3 10 10
Code:
a = np.sort(df[["Home", "Away"]], axis=1)
df = pd.concat([df, pd.DataFrame(a, columns=["team1", "team2"])], axis="columns")
df["sum"] = df[["Home_goals", "Away_goals"]].sum(axis="columns")
drop_last = df.groupby(["team1", "team2"]).apply(lambda x: x.iloc[:-1]).drop(["team1", "team2"], axis="columns").reset_index()
drop_last["mean"] = drop_last.groupby(["team1", "team2"])["sum"].transform("mean")
drop_last = drop_last[["team1", "team2", "mean"]]
drop_last = drop_last.drop_duplicates()
res = pd.merge(df, drop_last, on = ["team1", "team2"])
print(res)
result:
Home Away Home_goals Away_goals team1 team2 sum mean
0 Team-1 Team-2 2 1 Team-1 Team-2 3 5.5
1 Team-2 Team-1 5 3 Team-1 Team-2 8 5.5
2 Team-2 Team-1 10 10 Team-1 Team-2 20 5.5
3 Team-3 Team-4 3 5 Team-3 Team-4 8 7.0
4 Team-4 Team-3 1 5 Team-3 Team-4 6 7.0
5 Team-4 Team-3 10 10 Team-3 Team-4 20 7.0
Upvotes: 0