Reputation: 1958
So I have a pandas dataframe that has a boolean column that looks like this
IS_TRUE
0 True
1 True
2 True
3 True
4 True
5 False
6 False
7 False
8 True
9 True
10 False
11 False
12 True
13 True
14 True
15 False
.
.
.
9000 False
I was wondering if there's a vectorized way to get the ranges for which the IS_TRUE
column is true. So in this case it would be something like [(0,4),(8,9),(12,14)]
(inclusive). It can be exclusive ofc, i don't really think that's an issue.
I can run a for loop over the column of course... but i am just curious if there's a faster way
Upvotes: 0
Views: 154
Reputation: 953
You can use diff() to identify when your series switches from True to False. The index will contain the switch points.
a= iter(df[df['IS_TRUE'].diff().fillna(df['IS_TRUE'][0])].index.tolist()+[len(df)])
print([(el1, el2-1) for el1,el2 in zip(a,a) ])
Output:
[(0, 4), (8, 9), (12, 14)]
Upvotes: 1
Reputation: 323226
Let us do cumsum
df = df.reset_index()
s = (~df['IS_TRUE']).cumsum()
out = df[df['IS_TRUE']].groupby(s)['index'].agg(['min','max'])
Out[16]:
min max
IS_TRUE
0 0 4
3 8 9
5 12 14
l = out.values.tolist()
Out[18]: [[0, 4], [8, 9], [12, 14]]
Upvotes: 1