Reputation: 8966
I'm trying to filter a dataframe based on the values within the multiple columns, based on a single condition, but keep other columns to which I don't want to apply the filter at all.
I've reviewed these answers, with the third being the closest, but still no luck:
Setup:
import pandas as pd
df = pd.DataFrame({
'month':[1,1,1,2,2],
'a':['A','A','A','A','NONE'],
'b':['B','B','B','B','B'],
'c':['C','C','C','NONE','NONE']
}, columns = ['month','a','b','c'])
l = ['month','a','c']
df = df.loc[df['month'] == df['month'].max(), df.columns.isin(l)].reset_index(drop = True)
Current Output:
month a c
0 2 A NONE
1 2 NONE NONE
Desired Output:
month a
0 2 A
1 2 NONE
I've tried:
sub = l[1:]
df = df[(df.loc[:, sub] != 'NONE').any(axis = 1)]
and many other variations (.all()
, [sub, :]
, ~df.loc[...]
, (axis = 0)
), but all with no luck.
Basically I want to drop any column (within the sub
list) that has all 'NONE' values in it.
Any help is much appreciated.
Upvotes: 3
Views: 2433
Reputation: 294278
You first want to substitute your 'NONE'
with np.nan
so that it is recognized as a null value by dropna
. Then use loc
with your boolean series and column subset. Then use dropna
with axis=1
and how='all'
df.replace('NONE', np.nan) \
.loc[df.month == df.month.max(), l].dropna(axis=1, how='all')
month a
3 2 A
4 2 NONE
Upvotes: 3