nokvk
nokvk

Reputation: 353

Cumcount in new column but only starting from a condition

I want to create a new column in my Pandas dataframe that counts how many days have passed since the value in another column is a particular one. The goal in this example is to count how many days have passed since a player has scored a goal. I show you how would be the structure of the table that I would like to generate, in which a new column "Days since goal" would be created and it would start counting since the value "Yes" is included in the column "Goal".

      Player      Date            Goal     Days since goal
0     John        11-7-2022       No       0
1     Charles     11-7-2022       Yes      1
2     John        12-7-2022       Yes      1
3     Charles     12-7-2022       No       2
4     John        13-7-2022       No       2
5     Charles     13-7-2022       No       3
6     John        14-7-2022       No       3
7     Charles     14-7-2022       No       4
8     Peter       15-7-2022       No       0
9     John        15-7-2022       Yes      1
10    Charles     16-7-2022       No       5
11    Peter       16-7-2022       Yes      1
12    John        16-7-2022       No       2

I'm very closed to get it with this code:

blocks = df['Goal'].ge("Yes").groupby(df['Player']).cumsum()
df['Days since goal'] = df.groupby([df['Player'],blocks]).cumcount()

But the result it's not good at all because with the first "Yes" in the column "Days since goal" shows 0 instead of 1. This is the result I get with the code but it isn't what I need:

      Player      Date            Goal     Days since goal
0     John        11-7-2022       No       0
1     Charles     11-7-2022       Yes      0
2     John        12-7-2022       Yes      0
3     Charles     12-7-2022       No       1
4     John        13-7-2022       No       1
5     Charles     13-7-2022       No       2
6     John        14-7-2022       No       2
7     Charles     14-7-2022       No       3
8     Peter       15-7-2022       No       0
9     John        15-7-2022       Yes      0
10    Charles     16-7-2022       No       4
11    Peter       16-7-2022       Yes      0
12    John        16-7-2022       No       1

I need 0 for the player until the moment Goal is "Yes", and in this case, if it's "Yes", automatically is 1. Any suggestion? Thanks in advance

Upvotes: 1

Views: 195

Answers (1)

Laurent
Laurent

Reputation: 13458

With your initial dataframe:

df = pd.DataFrame({'Player': ['John', 'Charles', 'John', 'Charles', 'John', 'Charles', 'John', 'Charles', 'Peter', 'John', 'Charles', 'Peter', 'John'], 'Date': ['11-7-2022', '11-7-2022', '12-7-2022', '12-7-2022', '13-7-2022', '13-7-2022', '14-7-2022', '14-7-2022', '15-7-2022', '15-7-2022', '16-7-2022', '16-7-2022', '16-7-2022'], 'Goal': ['No', 'Yes', 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'Yes', 'No']}

Here is one way to do it:

def count(goals):
    """Helper function.
    
    Args:
        goals: list of Yes/No values.
    
    Returns:
        Cumcount of goals.
    
    """
    counts = []
    for i, goal in enumerate(goals):
        if i == 0 and goal == "No":
            counts.append(0)
            continue
        if goal == "":
            continue
        for i in range(2, 2 + int(len(goal) / 2)):
            counts.append(i)
    return counts
df = df.sort_values(by=["Player", "Date"])

no_counts = []
for player in df["Player"].unique():
    goals = "".join(df.loc[df["Player"] == player, "Goal"].tolist()).split("Yes")
    no_counts += count(goals)

df.loc[df["Goal"] == "No", "Days since goal"] = no_counts

df = df.fillna(1).sort_index().astype({"Days since goal": int})
print(df)
# Output
     Player       Date Goal  Days since goal
0      John  11-7-2022   No                0
1   Charles  11-7-2022  Yes                1
2      John  12-7-2022  Yes                1
3   Charles  12-7-2022   No                2
4      John  13-7-2022   No                2
5   Charles  13-7-2022   No                3
6      John  14-7-2022   No                3
7   Charles  14-7-2022   No                4
8     Peter  15-7-2022   No                0
9      John  15-7-2022  Yes                1
10  Charles  16-7-2022   No                5
11    Peter  16-7-2022  Yes                1
12     John  16-7-2022   No                2

Upvotes: 1

Related Questions