Reputation: 414
this is my dataset:
Day Player Score
1 Andy 6
2 Boby 6
3 Andy 4
4 Carl 2
5 Andy 3
This is a simplified example, but basically I want to add a new column ('last_pass_date') which looks for the previous index when the 'Player' achieves a score >5.
What I have is:
df['last_pass_date'] = df.groupby(['Player'])['Day'].shift(1)
but I am unsure on how to add the additional filter of 'Score' > 5. Any help will be greatly appreciated. Thanks!
EDIT: Output from my solution is [NaN,NaN,'1',NaN,'3'] when ideally it should be [NaN,NaN,'1',NaN,'1'] since row 3 does not satisfy Score > 5.
Upvotes: 0
Views: 47
Reputation: 3161
How about this:
df = pd.DataFrame({"Day": [1, 2, 3, 4, 5],
"Player": ["Andy", "Boby", "Andy", "Carl", "Andy"],
"Score": [6, 6, 4, 2, 3],
})
df['last_pass_date'] = None
passed_indices = df["Score"] > 5
df.loc[passed_indices, 'last_pass_date'] = df.loc[passed_indices, 'Day']
df['last_pass_date'] = df.groupby("Player")['last_pass_date'].apply(
lambda group: group.shift().ffill())
print(df)
Produces:
Day Player Score last_pass_date
0 1 Andy 6 NaN
1 2 Boby 6 NaN
2 3 Andy 4 1.0
3 4 Carl 2 NaN
4 5 Andy 3 1.0
The idea is to set last_pass_date to the current date when the player passes, then forward fill that column for each player. The only extra trick required is to shift by 1 before forward filling the column grouped by players (that way you get the pass date on which the player previously passed)
Upvotes: 1
Reputation: 28253
Create a column of passed
:
df['passed'] = df.Day[df.Score > 5]
Group by Player
, shift passed
and forward fill to carry the last pass date
df['last_pass_date'] = df.groupby('Player').passed.apply(lambda x: x.shift().ffill())
Produces the following output for me:
Day Player Score passed last_pass_date
0 1 Andy 6 1.0 NaN
1 2 Boby 6 2.0 NaN
2 3 Andy 4 NaN 1.0
3 4 Carl 2 NaN NaN
4 5 Andy 3 NaN 1.0
Upvotes: 1