Reputation: 353
I want to create a new column in my Pandas dataframe that counts how many days have passed since the value in another column is a particular one. The goal in this example is to count how many days have passed since a player has scored a goal. I show you how would be the structure of the table that I would like to generate, in which a new column "Days since goal" would be created and it would start counting since the value "Yes" is included in the column "Goal".
Player Date Goal Days since goal
0 John 11-7-2022 No 0
1 Charles 11-7-2022 Yes 1
2 John 12-7-2022 Yes 1
3 Charles 12-7-2022 No 2
4 John 13-7-2022 No 2
5 Charles 13-7-2022 No 3
6 John 14-7-2022 No 3
7 Charles 14-7-2022 No 4
8 Peter 15-7-2022 No 0
9 John 15-7-2022 Yes 1
10 Charles 16-7-2022 No 5
11 Peter 16-7-2022 Yes 1
12 John 16-7-2022 No 2
I'm very closed to get it with this code:
blocks = df['Goal'].ge("Yes").groupby(df['Player']).cumsum()
df['Days since goal'] = df.groupby([df['Player'],blocks]).cumcount()
But the result it's not good at all because with the first "Yes" in the column "Days since goal" shows 0 instead of 1. This is the result I get with the code but it isn't what I need:
Player Date Goal Days since goal
0 John 11-7-2022 No 0
1 Charles 11-7-2022 Yes 0
2 John 12-7-2022 Yes 0
3 Charles 12-7-2022 No 1
4 John 13-7-2022 No 1
5 Charles 13-7-2022 No 2
6 John 14-7-2022 No 2
7 Charles 14-7-2022 No 3
8 Peter 15-7-2022 No 0
9 John 15-7-2022 Yes 0
10 Charles 16-7-2022 No 4
11 Peter 16-7-2022 Yes 0
12 John 16-7-2022 No 1
I need 0 for the player until the moment Goal is "Yes", and in this case, if it's "Yes", automatically is 1. Any suggestion? Thanks in advance
Upvotes: 1
Views: 195
Reputation: 13458
With your initial dataframe:
df = pd.DataFrame({'Player': ['John', 'Charles', 'John', 'Charles', 'John', 'Charles', 'John', 'Charles', 'Peter', 'John', 'Charles', 'Peter', 'John'], 'Date': ['11-7-2022', '11-7-2022', '12-7-2022', '12-7-2022', '13-7-2022', '13-7-2022', '14-7-2022', '14-7-2022', '15-7-2022', '15-7-2022', '16-7-2022', '16-7-2022', '16-7-2022'], 'Goal': ['No', 'Yes', 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'Yes', 'No']}
Here is one way to do it:
def count(goals):
"""Helper function.
Args:
goals: list of Yes/No values.
Returns:
Cumcount of goals.
"""
counts = []
for i, goal in enumerate(goals):
if i == 0 and goal == "No":
counts.append(0)
continue
if goal == "":
continue
for i in range(2, 2 + int(len(goal) / 2)):
counts.append(i)
return counts
df = df.sort_values(by=["Player", "Date"])
no_counts = []
for player in df["Player"].unique():
goals = "".join(df.loc[df["Player"] == player, "Goal"].tolist()).split("Yes")
no_counts += count(goals)
df.loc[df["Goal"] == "No", "Days since goal"] = no_counts
df = df.fillna(1).sort_index().astype({"Days since goal": int})
print(df)
# Output
Player Date Goal Days since goal
0 John 11-7-2022 No 0
1 Charles 11-7-2022 Yes 1
2 John 12-7-2022 Yes 1
3 Charles 12-7-2022 No 2
4 John 13-7-2022 No 2
5 Charles 13-7-2022 No 3
6 John 14-7-2022 No 3
7 Charles 14-7-2022 No 4
8 Peter 15-7-2022 No 0
9 John 15-7-2022 Yes 1
10 Charles 16-7-2022 No 5
11 Peter 16-7-2022 Yes 1
12 John 16-7-2022 No 2
Upvotes: 1