Joe
Joe

Reputation: 1958

Vectorized "True" value ranges for a boolean row in Pandas DataFrame

So I have a pandas dataframe that has a boolean column that looks like this

    IS_TRUE
           
0      True
1      True
2      True
3      True
4      True
5     False
6     False
7     False
8      True
9      True
10    False
11    False
12     True
13     True
14     True
15     False
.
.
.
9000   False

I was wondering if there's a vectorized way to get the ranges for which the IS_TRUE column is true. So in this case it would be something like [(0,4),(8,9),(12,14)] (inclusive). It can be exclusive ofc, i don't really think that's an issue.

I can run a for loop over the column of course... but i am just curious if there's a faster way

Upvotes: 0

Views: 154

Answers (2)

LevB
LevB

Reputation: 953

You can use diff() to identify when your series switches from True to False. The index will contain the switch points.

a= iter(df[df['IS_TRUE'].diff().fillna(df['IS_TRUE'][0])].index.tolist()+[len(df)])
print([(el1, el2-1) for el1,el2 in zip(a,a) ])

Output:

[(0, 4), (8, 9), (12, 14)] 

Upvotes: 1

BENY
BENY

Reputation: 323226

Let us do cumsum

df = df.reset_index()
s = (~df['IS_TRUE']).cumsum()
out = df[df['IS_TRUE']].groupby(s)['index'].agg(['min','max'])
Out[16]: 
         min  max
IS_TRUE          
0          0    4
3          8    9
5         12   14

l = out.values.tolist()
Out[18]: [[0, 4], [8, 9], [12, 14]]

Upvotes: 1

Related Questions