Adrian Y
Adrian Y

Reputation: 414

Adding filters to groupby function

this is my dataset:

Day    Player    Score
1        Andy      6
2        Boby      6
3        Andy      4
4        Carl      2
5        Andy      3

This is a simplified example, but basically I want to add a new column ('last_pass_date') which looks for the previous index when the 'Player' achieves a score >5.

What I have is:

df['last_pass_date'] = df.groupby(['Player'])['Day'].shift(1)

but I am unsure on how to add the additional filter of 'Score' > 5. Any help will be greatly appreciated. Thanks!

EDIT: Output from my solution is [NaN,NaN,'1',NaN,'3'] when ideally it should be [NaN,NaN,'1',NaN,'1'] since row 3 does not satisfy Score > 5.

Upvotes: 0

Views: 47

Answers (2)

augray
augray

Reputation: 3161

How about this:

df = pd.DataFrame({"Day": [1, 2, 3, 4, 5],
                   "Player": ["Andy", "Boby", "Andy", "Carl", "Andy"],
                   "Score": [6, 6, 4, 2, 3],
                   })
df['last_pass_date'] = None
passed_indices = df["Score"] > 5
df.loc[passed_indices, 'last_pass_date'] = df.loc[passed_indices, 'Day']
df['last_pass_date'] = df.groupby("Player")['last_pass_date'].apply(
    lambda group: group.shift().ffill())
print(df)

Produces:

   Day Player  Score  last_pass_date
0    1   Andy      6             NaN
1    2   Boby      6             NaN
2    3   Andy      4             1.0
3    4   Carl      2             NaN
4    5   Andy      3             1.0

The idea is to set last_pass_date to the current date when the player passes, then forward fill that column for each player. The only extra trick required is to shift by 1 before forward filling the column grouped by players (that way you get the pass date on which the player previously passed)

Upvotes: 1

Haleemur Ali
Haleemur Ali

Reputation: 28253

Create a column of passed:

df['passed'] = df.Day[df.Score > 5]

Group by Player, shift passed and forward fill to carry the last pass date

df['last_pass_date'] = df.groupby('Player').passed.apply(lambda x: x.shift().ffill())

Produces the following output for me:

   Day Player  Score  passed  last_pass_date
0    1   Andy      6     1.0             NaN
1    2   Boby      6     2.0             NaN
2    3   Andy      4     NaN             1.0
3    4   Carl      2     NaN             NaN
4    5   Andy      3     NaN             1.0

Upvotes: 1

Related Questions