Reputation: 575
I have a DataFrame with longitudinal data for many people. A success flag [0,1] is in one column.
A sample looks like this:
person success wait
0 bob 0 0
1 bob 1 1
2 bob 0 0
3 bob 0 1
4 bob 1 2
5 bob 0 0
6 mary 1 1
7 mary 0 0
8 mary 0 1
9 mary 0 2
Notice the last column named "wait". For this example, I calculated it by had. The wait is the count of 0 in the success column before a 1. In simple terms, it is, "How many failed attempts before the success.
I can't figure out a way to do this quickly with pandas. My thought was to groupby('person'), and then run some function on the wait column, but don't know what.
Any suggestions?
Upvotes: 2
Views: 2178
Reputation: 31161
You can do:
df['ix'] = df.groupby('person')['success'].apply(
lambda x: pd.Series(0).append(x.cumsum().head(-1))
).tolist()
df['wait'] = df.groupby(['person','ix']).apply(lambda x: pd.Series(range(len(x)))).tolist()
#In [97]: df
#Out[97]:
# person success wait ix
#0 bob 0 0 0
#1 bob 1 1 0
#2 bob 0 0 1
#3 bob 0 1 1
#4 bob 1 2 1
#5 bob 0 0 2
#6 mary 1 0 0
#7 mary 0 0 1
#8 mary 0 1 1
#9 mary 0 2 1
Upvotes: 1