Reputation: 2167
I have a data frame as below.
In [23]: data2 = [{'a': 'x', 'b': 'y','c':'q'}, {'a': 'x', 'b': 'p', 'c': 'q'}, {'a':'p', 'b':'q'},{'a':'q', 'b':'y','c':'q'}]
In [26]: df = pd.DataFrame(data2)
In [27]: df
Out[27]:
a b c
0 x y q
1 x p q
2 p q NaN
3 q y q
I want to do boolean indexing to filter out columns which have either x or y. This i am doing as
In [29]: df[df['a'].isin(['x','y']) | (df['b'].isin(['x','y']))]
Out[29]:
a b c
0 x y q
1 x p q
3 q y q
But i have over 50 columns in which i need to check and checking each columns seems not very pythonic. I tried
In [30]: df[df[['a','b']].isin(['x','y'])]
But the output is not what i expect, i get the below
Out[30]:
a b c
0 x y NaN
1 x NaN NaN
2 NaN NaN NaN
3 NaN y NaN
I can drop rows which are all NaN but the values are missing in the rest.
For example in row-0 columns-c is NaN but i need that value.
Any suggestions how to do this ?
Upvotes: 0
Views: 1428
Reputation: 51335
This works:
df.loc[df.apply(lambda x: 'x' in list(x) or 'y' in list(x), axis=1)]
a b c
0 x y q
1 x p q
3 q y q
Upvotes: 1
Reputation: 19947
You can compare your df with 'x' and 'y' and then do a logical or to find rows with either 'x' or 'y'. Then use the boolean array as index to select those rows.
df.loc[(df.eq('x') | df.eq('y')).any(1)]
Out[68]:
a b c
0 x y q
1 x p q
3 q y q
Upvotes: 2