Pandas: Filter by values within multiple columns

Question

I'm trying to filter a dataframe based on the values within the multiple columns, based on a single condition, but keep other columns to which I don't want to apply the filter at all.

I've reviewed these answers, with the third being the closest, but still no luck:

Setup:

import pandas as pd

df = pd.DataFrame({
        'month':[1,1,1,2,2],
        'a':['A','A','A','A','NONE'],
        'b':['B','B','B','B','B'],
        'c':['C','C','C','NONE','NONE']
    }, columns = ['month','a','b','c'])

l = ['month','a','c']
df = df.loc[df['month'] == df['month'].max(), df.columns.isin(l)].reset_index(drop = True)

Current Output:

   month     a     c
0      2     A  NONE
1      2  NONE  NONE

Desired Output:

   month     a
0      2     A
1      2  NONE

I've tried:

sub = l[1:]
df = df[(df.loc[:, sub] != 'NONE').any(axis = 1)]

and many other variations (.all(), [sub, :], ~df.loc[...], (axis = 0)), but all with no luck.

Basically I want to drop any column (within the sub list) that has all 'NONE' values in it.

Any help is much appreciated.

piRSquared · Accepted Answer

You first want to substitute your 'NONE' with np.nan so that it is recognized as a null value by dropna. Then use loc with your boolean series and column subset. Then use dropna with axis=1 and how='all'

df.replace('NONE', np.nan) \
    .loc[df.month == df.month.max(), l].dropna(axis=1, how='all')

   month     a
3      2     A
4      2  NONE

Pandas: Filter by values within multiple columns

Answers (1)

Related Questions