Reputation: 527
I have dataset for matches and home team result at the current match
match_date home away home_result
2021-11-22 team1 team2 Win
2021-11-22 team3 team4 Win
2021-11-23 team1 team8 Lose
2021-11-23 team6 team7 Win
2021-11-25 team1 team2 Win
2021-11-25 team3 team8 Lose
2021-11-25 team1 team5 Lose
2021-11-25 team6 team5 Win
2021-11-28 team3 team1 Lose
2021-11-29 team1 team5 Win
2021-11-29 team6 team9 Win
I have a code to create new column where I can put previous result for each home team before the current match:
df['home_team_previous_results'] = (
df.groupby('home')
.apply(
lambda x: pd.Series(
[
[
tuple([row[col] for col in ['home_result', 'match_date']])
for _, row in x.iloc[0:i].iterrows()
] or np.nan
for i in range(len(x))
],
index=x.index)
).droplevel(0)
)
this is the output:
match_date home away home_result home_team_previous_results
2021-11-22 team1 team2 Win NaN
2021-11-22 team3 team4 Win NaN
2021-11-23 team1 team8 Lose [("Win","2021-11-22")]
2021-11-23 team6 team7 Win NaN
2021-11-25 team1 team2 Win [("Win","2021-11-22"), ("Lose","2021-11-23")]
2021-11-25 team3 team8 Lose [("Win","2021-11-22")]
2021-11-25 team1 team5 Lose [("Win","2021-11-22"), ("Lose","2021-11-23"), ("Win","2021-11-25")]
2021-11-25 team6 team5 Win [("Win","2021-11-23")]
2021-11-28 team3 team1 Lose [("Win","2021-11-22"), ("Lose","2021-11-25")]
2021-11-29 team1 team5 Win [("Win","2021-11-22"), ("Lose","2021-11-23"), ("Win","2021-11-25"), ("Lose","2021-11-25")]
2021-11-29 team6 team9 Win [("Win","2021-11-23"), ("Win","2021-11-25")]
The issue is that the code takes a long time with a large dataset (more than 60,000 rows) and I need it to run very, very fast. Any idea how to make it faster or better version of it ?
Upvotes: 0
Views: 142
Reputation: 323226
You can do egg tuple
then cumsum
df['new'] = df[['match_date','home_result']].agg(tuple,1).groupby(df['home']).apply(lambda x : x.cumsum().shift())
0 NaN
1 NaN
2 (2021-11-22, Win)
3 NaN
4 (2021-11-22, Win, 2021-11-23, Lose)
5 (2021-11-22, Win)
6 (2021-11-22, Win, 2021-11-23, Lose, 2021-11-25...
7 (2021-11-23, Win)
8 (2021-11-22, Win, 2021-11-25, Lose)
9 (2021-11-22, Win, 2021-11-23, Lose, 2021-11-25...
10 (2021-11-23, Win, 2021-11-25, Win)
dtype: object
Upvotes: 3