Reputation: 1

how to create subsets of pandas dataframe based on a boolean array value?

I am working on a pandas dataframe where one column (bullish) consist of boolean values and the second one (split) is also boolean based and it is True whenever the first column value is different than the one before it. I did this:

df['split'] = df['bullish'] != df['bullish'].shift(-1)

Now i would like to slice the pandas dataframe in smaller subset at each point where the split value is True so that it creates subsets where all df['bullish'] values are either only True or only False.

the dataframe I have

Thank you in advance for your insights !

Upvotes: 0

Answers (1)

piterbarg

Reputation: 8219

Welcome to stackoverflow. Please be so kind as to review How to make good reproducible pandas examples before asking further questions on Pandas

As for your specific question, let's start with a simple example

import pandas as pd
df = pd.DataFrame({'bullish' : [True, True, True, True, False, False, True]})

Then we can split as such (note I think you got the shift(-1) slightly wrong here)

df['split'] = df['bullish'] != df['bullish'].shift() 
df

this produces


    bullish split
0   True    True
1   True    False
2   True    False
3   True    False
4   False   True
5   False   False
6   True    True

To achieve what you want you can use a combination of groupby and cumsum applied to the 'split' column, like so:

for id, g in df.groupby(df['split'].cumsum()):
    print(f'group_id = {id}')
    print(g)

This will print three dataframes with the same 'bullish' value inside each:

group_id = 1
   bullish  split
0     True   True
1     True  False
2     True  False
3     True  False
group_id = 2
   bullish  split
4    False   True
5    False  False
group_id = 3
   bullish  split
6     True   True

Since you did not specify what output you actually wanted (another good practice on SO) I will leave it at that

Upvotes: 1

how to create subsets of pandas dataframe based on a boolean array value?

Answers (1)

Related Questions