Nicole Suter
Nicole Suter

Reputation: 99

Python/pandas: How to increment each row based on string condition

I have a dataframe which is populated by answers from a Google Form. The new_cycle column is yes/no question on the form. In my dataframe, I need to calculate the cycle_day for each response. In other words, I need the 'yes' to be 1, and every 'no' afterwards to increment by 1. Then, once another 'yes' response is recorded, the count resets.

Here is a minimal, reproducible example. Note that I'm actually doing this on a much larger dataframe, so an optimal solution is paramount.

df = pd.DataFrame(['yes', 'no', 'no', 'no', 'yes', 'no'], columns=['new_cycle'])

# df:
    new_cycle
0   yes
1   no
2   no
3   no
4   yes
5   no

My desired output would be:

    new_cycle
0   1
1   2
2   3
3   4
4   1
5   2

# OR:
    new_cycle   cycle_day
0   yes         1
1   no          2
2   no          3
3   no          4
4   yes         1
5   no          2

How would I do this?

Upvotes: 1

Views: 1141

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

Let's try this and do it one step:

df['cycle_day'] = df.groupby(df['new_cycle'].eq('yes').cumsum()).cumcount() + 1

Output:

 new_cycle  cycle_day
0       yes          1
1        no          2
2        no          3
3        no          4
4       yes          1
5        no          2

Upvotes: 2

Yifei H
Yifei H

Reputation: 76

  1. first add a new column and set them all to 1

    df['val'] = 1

  2. transform new_cycle column into 1 (yes) and 0 (no)

    df['reset'] = df['new_cycle'] == 'yes'

  3. do a cumsum on the new reset column

    df['cumsum']=df['reset'].cumsum()

  4. group by the cumsum column, then do a cumsum on val column and get the result

    df['cycle_day'] = df.groupby(['cumsum'])['val'].cumsum()

This is adapted from this answer

Upvotes: 1

Related Questions