Reputation: 449
I have the following large dataset recording the result of a math competition among students in descending order of date: So for example, student 1 comes third in Race 1 while student 3 won Race 2, etc.
Race_ID Date Student_ID Rank
21 1/1/2023 1 3
21 1/1/2023 2 2
21 1/1/2023 3 1
21 1/1/2023 4 4
25 11/9/2022 1 2
25 11/9/2022 2 3
25 11/9/2022 3 1
3 17/4/2022 5 4
3 17/4/2022 2 1
3 17/4/2022 3 2
3 17/4/2022 4 3
14 1/3/2022 1 1
14 1/3/2022 2 2
85 1/1/2021 1 2
85 1/1/2021 2 3
85 1/1/2021 3 1
And I want to create a new column called Last_win
which returns the Race_ID
of the last time that student won (i.e. rank number 1). So the outcome should look like
Race_ID Date Student_ID Rank Last_win
21 1/1/2023 1 3 14
21 1/1/2023 2 2 3
21 1/1/2023 3 1 25
21 1/1/2023 4 4 NaN
25 11/9/2022 1 2 14
25 11/9/2022 2 3 3
25 11/9/2022 3 1 85
3 17/4/2022 5 4 NaN
3 17/4/2022 2 1 NaN
3 17/4/2022 3 2 85
3 17/4/2022 4 3 NaN
14 1/3/2022 1 1 NaN
14 1/3/2022 2 2 NaN
85 1/1/2021 1 2 NaN
85 1/1/2021 2 3 NaN
85 1/1/2021 3 1 NaN
Thank you so much inadvacne.
Upvotes: 1
Views: 68
Reputation: 120391
Keep rows where Rank is 1 and hide others then group by Student. Finally, fill backward Race_ID and don't forget to shift the result:
# Convert Date column to DatetimeIndex if needed
# df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df['Last_win'] = (df.where(df['Rank'] == 1)
.groupby(df['Student_ID'])['Race_ID']
.transform(lambda x: x.bfill().shift(-1)))
print(df)
# Output
Race_ID Date Student_ID Rank Last_win
0 21 2023-01-01 1 3 14.0
1 21 2023-01-01 2 2 3.0
2 21 2023-01-01 3 1 25.0
3 21 2023-01-01 4 4 NaN
4 25 2022-09-11 1 2 14.0
5 25 2022-09-11 2 3 3.0
6 25 2022-09-11 3 1 85.0
7 3 2022-04-17 5 4 NaN
8 3 2022-04-17 2 1 NaN
9 3 2022-04-17 3 2 85.0
10 3 2022-04-17 4 3 NaN
11 14 2022-03-01 1 1 NaN
12 14 2022-03-01 2 2 NaN
13 85 2021-01-01 1 2 NaN
14 85 2021-01-01 2 3 NaN
15 85 2021-01-01 3 1 NaN
Upvotes: 2
Reputation: 862511
Use merge_asof
for merge previous datetimes by parameter allow_exact_matches=False
:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.sort_values('Date')
df1 = (df.loc[df['Rank'].eq(1), ['Date','Student_ID','Race_ID']]
.rename(columns={'Race_ID':'Last_win'}))
df = (pd.merge_asof(df.reset_index(),
df1, on='Date', allow_exact_matches=False, by='Student_ID')
.sort_values('index', ignore_index=True)
.drop('index', axis=1))
print (df)
Race_ID Date Student_ID Rank Last_win
0 21 2023-01-01 1 3 14.0
1 21 2023-01-01 2 2 3.0
2 21 2023-01-01 3 1 25.0
3 21 2023-01-01 4 4 NaN
4 25 2022-09-11 1 2 14.0
5 25 2022-09-11 2 3 3.0
6 25 2022-09-11 3 1 85.0
7 3 2022-04-17 5 4 NaN
8 3 2022-04-17 2 1 NaN
9 3 2022-04-17 3 2 85.0
10 3 2022-04-17 4 3 NaN
11 14 2022-03-01 1 1 NaN
12 14 2022-03-01 2 2 NaN
13 85 2021-01-01 1 2 NaN
14 85 2021-01-01 2 3 NaN
15 85 2021-01-01 3 1 NaN
Upvotes: 1