luka
luka

Reputation: 529

Groupby dataframe and count based on column condition

I have this kind of Dataframe

H = Home win
D = Draw
A = Away win

            Datetime    HomeTeam            AwayTeam            HG  AG  FT
0   2021-02-17 22:00:00 Colo Colo           U. De Concepcion    1   0   H
1   2021-02-15 14:30:00 Cobresal            U. Espanola         4   1   H
2   2021-02-14 22:00:00 Deportes Iquique    S. Wanderers        2   0   H
3   2021-02-14 22:00:00 La Serena           A. Italiano         0   2   A
4   2021-02-14 22:00:00 O'Higgins           Colo Colo           1   1   D
... ... ... ... ... ... ...

I want to count the previous wins of teams per match in the rows

Code:

_hometeam_countH = df.groupby(["HomeTeam", 'FT' == 'H']).apply(lambda x: x.iloc[1:, :]["FT"].count())
_hometeam_count = df.groupby("HomeTeam").apply(lambda x: x.iloc[1:, :]["FT"].count())

df["Wins/Match"] = df["HomeTeam"].apply(lambda x: (_hometeam_countH.loc[x] if x in _hometeam_countH.index else 0) / (_hometeam_count.loc[x] if x in _hometeam_count.index else 0))

But got _hometeam_countH = df.groupby(["HomeTeam", 'FT' == 'H']).apply(lambda x: x.iloc[1:, :]["FT"].count()), KeyError: False

How can I groupby the dataframe by "HomeTeam" and "FT" with a conditions?

Upvotes: 1

Views: 67

Answers (1)

Rutger
Rutger

Reputation: 603

I can't test it without the dataframe but I think I see where it goes wrong. In the groupby, you put a list of column names normally. You put 'FT' == 'H' which checks if FT is the same as H (surprisingly it is not).

You might want to groupby lke this:

_hometeam_countH = df.loc[df['FT'] == 'H'].groupby('HomeTeam').size()

which gives the number of home games per team. Did I interpret that well?

Upvotes: 2

Related Questions