Reputation: 111
The following dataframe df1
contains numerical values
IDs Value1 Value2 Value Value4
AB 1 1 1 5
BC 2 2 2 3
BG 1 1 4 1
RF 2 2 2 7
and this dataframe df2
contains Boolean values:
Index 0 1 2 3
1 True False True True
2 False False True False
3 False False True False
4 False False False False
with the same number of columns and rows.
What I need is to subset df1
in the following manner: get only the columns that in df2
have at least on True
value.
Meaning the following:
IDs Value1 Value3 Value4
AB 1 1 5
BC 2 2 3
BG 1 4 1
RF 2 2 7
I have tried the following code:
df2_true = np.any(df2,axis=1)
However, the line above returns a list which can not be used here:
result = df1[:,df2_true]
Any help would be welcome
Upvotes: 2
Views: 1052
Reputation: 168
Your already in the right direction, however since your interested in masking the columns you just need to apply the np.any() operation on the other axis and then apply your boolean mask to the columns attribute of the original dataframe:
masked_df = df1.columns[df2.any(axis=0)]
Upvotes: 1
Reputation: 323226
I think it will work
df1.loc[:,df2.any(0).values.tolist()]
Out[741]:
Value1 Value Value4
IDs
AB 1 1 5
BC 2 2 3
BG 1 4 1
RF 2 2 7
Upvotes: 3