Pandas DataFrame - Count rows between values efficiently

Question

I have a DataFrame with longitudinal data for many people. A success flag [0,1] is in one column.

A sample looks like this:

  person  success  wait
0    bob        0     0
1    bob        1     1
2    bob        0     0
3    bob        0     1
4    bob        1     2
5    bob        0     0
6   mary        1     1
7   mary        0     0
8   mary        0     1
9   mary        0     2

Notice the last column named "wait". For this example, I calculated it by had. The wait is the count of 0 in the success column before a 1. In simple terms, it is, "How many failed attempts before the success.

I can't figure out a way to do this quickly with pandas. My thought was to groupby('person'), and then run some function on the wait column, but don't know what.

Any suggestions?

Colonel Beauvel · Accepted Answer

You can do:

df['ix'] = df.groupby('person')['success'].apply(
    lambda x: pd.Series(0).append(x.cumsum().head(-1))
).tolist()


df['wait'] = df.groupby(['person','ix']).apply(lambda x: pd.Series(range(len(x)))).tolist()

#In [97]: df
#Out[97]:
#  person  success  wait  ix
#0    bob        0     0   0
#1    bob        1     1   0
#2    bob        0     0   1
#3    bob        0     1   1
#4    bob        1     2   1
#5    bob        0     0   2
#6   mary        1     0   0
#7   mary        0     0   1
#8   mary        0     1   1
#9   mary        0     2   1

Pandas DataFrame - Count rows between values efficiently

Answers (1)

Related Questions