user37143
user37143

Reputation: 111

Boolean Dataframe filter for another Dataframe

The following dataframe df1 contains numerical values

   IDs          Value1      Value2        Value     Value4
   AB              1          1             1       5
   BC              2          2             2       3
   BG              1          1             4       1
   RF              2          2             2       7

and this dataframe df2 contains Boolean values:

   Index          0                1             2         3
   1              True           False          True       True
   2              False          False          True       False
   3              False          False          True       False
   4              False          False          False      False

with the same number of columns and rows.

What I need is to subset df1 in the following manner: get only the columns that in df2 have at least on True value.

Meaning the following:

   IDs          Value1         Value3     Value4
   AB              1              1       5
   BC              2              2       3
   BG              1              4       1
   RF              2              2       7

I have tried the following code:

df2_true = np.any(df2,axis=1)

However, the line above returns a list which can not be used here:

result = df1[:,df2_true]

Any help would be welcome

Upvotes: 2

Views: 1052

Answers (3)

mxkrn
mxkrn

Reputation: 168

Your already in the right direction, however since your interested in masking the columns you just need to apply the np.any() operation on the other axis and then apply your boolean mask to the columns attribute of the original dataframe:

masked_df = df1.columns[df2.any(axis=0)]

Upvotes: 1

BENY
BENY

Reputation: 323226

I think it will work

df1.loc[:,df2.any(0).values.tolist()]
Out[741]: 
     Value1  Value  Value4
IDs                       
AB        1      1       5
BC        2      2       3
BG        1      4       1
RF        2      2       7

Upvotes: 3

jezrael
jezrael

Reputation: 862641

Use loc with np.any per index (axis=0):

result = df1.loc[:, np.any(df2.values,axis=0)]
print (result)
     Value1  Value  Value4
IDs                       
AB        1      1       5
BC        2      2       3
BG        1      4       1
RF        2      2       7

Upvotes: 3

Related Questions