Reputation: 99
I have a dataframe which is populated by answers from a Google Form. The new_cycle
column is yes/no question on the form. In my dataframe, I need to calculate the cycle_day
for each response. In other words, I need the 'yes' to be 1, and every 'no' afterwards to increment by 1. Then, once another 'yes' response is recorded, the count resets.
Here is a minimal, reproducible example. Note that I'm actually doing this on a much larger dataframe, so an optimal solution is paramount.
df = pd.DataFrame(['yes', 'no', 'no', 'no', 'yes', 'no'], columns=['new_cycle'])
# df:
new_cycle
0 yes
1 no
2 no
3 no
4 yes
5 no
My desired output would be:
new_cycle
0 1
1 2
2 3
3 4
4 1
5 2
# OR:
new_cycle cycle_day
0 yes 1
1 no 2
2 no 3
3 no 4
4 yes 1
5 no 2
How would I do this?
Upvotes: 1
Views: 1141
Reputation: 153460
Let's try this and do it one step:
df['cycle_day'] = df.groupby(df['new_cycle'].eq('yes').cumsum()).cumcount() + 1
Output:
new_cycle cycle_day
0 yes 1
1 no 2
2 no 3
3 no 4
4 yes 1
5 no 2
Upvotes: 2
Reputation: 76
first add a new column and set them all to 1
df['val'] = 1
transform new_cycle column into 1 (yes) and 0 (no)
df['reset'] = df['new_cycle'] == 'yes'
do a cumsum on the new reset column
df['cumsum']=df['reset'].cumsum()
group by the cumsum column, then do a cumsum on val column and get the result
df['cycle_day'] = df.groupby(['cumsum'])['val'].cumsum()
This is adapted from this answer
Upvotes: 1