Reputation: 3
I have two columns I am working with. The first column is populated with zeros and the second column is populated with booleans.
column 1 column 2
0 True
0 True
0 False
0 True
0 True
0 False
0 False
0 True
There are millions of rows so I am trying to figure an efficient process that looks at column 2 and for each grouping of True bools adds 1 to column 1.
column 1 column 2
1 True
1 True
0 False
2 True
2 True
0 False
0 False
3 True
Any help is much appreciated!
Upvotes: 0
Views: 75
Reputation: 294218
df['column 3'] = (df['column 2'] & (df['column 2'].shift() != True))
df['column 4'] = df['column 3'].cumsum()
df['column 1'] = df['column 2'] * df['column 4']
print df
column 1 column 2 column 3 column 4
0 1 True True 1
1 1 True False 1
2 0 False False 1
3 2 True True 2
4 2 True False 2
5 0 False False 2
6 0 False False 2
7 3 True True 3
Upvotes: 0
Reputation: 353009
One trick which often comes in handy when vectorizing operations on contiguous groups is the shift-cumsum pattern:
>>> c = df["column 2"]
>>> c * (c & (c != c.shift())).cumsum()
0 1
1 1
2 0
3 2
4 2
5 0
6 0
7 3
Name: column 2, dtype: int32
Upvotes: 3