Reputation: 318
Suppose I have a df
t status
1 ok
2 ok
3 ok
4 closed
5 closed
6 closed
7 bad input
8 bad input
9 closed
10 closed
11 ok
12 ok
13 closed
14 closed
I want to identify at what time "closed" appears and for how long.
So the result should be
t status index
1 ok 0
2 ok 0
3 ok 0
4 closed 1
5 closed 1
6 closed 1
7 bad input 0
8 bad input 0
9 closed 2
10 closed 2
11 ok 0
12 ok 0
13 closed 3
14 closed 3
I tried standard "for loop" approach but it is not feasible for large dataframe. I am thinking of a solution using numpy where and repeat
np.where(tmp['status']=='Closed', 1, 0)
I am stuck on adding 1 everytime "Closed" reappears
Upvotes: 0
Views: 71
Reputation: 75080
trying something different:
import more_itertools as mit
s=df[df.status.eq('closed')].index.tolist() #get list of index which matches condition
d={v_:k+1 for k,v in enumerate(mit.consecutive_groups(s)) for v_ in v}
df.assign(New=df.index.map(d).fillna(0).astype(int)) #assign this back df=df.assign(..
t status New
0 1 ok 0
1 2 ok 0
2 3 ok 0
3 4 closed 1
4 5 closed 1
5 6 closed 1
6 7 bad input 0
7 8 bad input 0
8 9 closed 2
9 10 closed 2
10 11 ok 0
11 12 ok 0
12 13 closed 3
13 14 closed 3
Upvotes: 1
Reputation: 323226
IIUC we using shift
cumsum
create the condition
df['New']=0
df.loc[df.status=='closed','New']=(df.status.eq('closed')&df.status.shift().ne('closed')).cumsum()
df
t status New
0 1 ok 0
1 2 ok 0
2 3 ok 0
3 4 closed 1
4 5 closed 1
5 6 closed 1
6 7 badinput 0
7 8 badinput 0
8 9 closed 2
9 10 closed 2
10 11 ok 0
11 12 ok 0
12 13 closed 3
13 14 closed 3
Upvotes: 2