Reputation: 113
Suppose we have a dataframe with following columns 'Age', 'Name', 'Sex'
, where 'Age'
and 'Sex'
contain missing values. I want to drop all columns with missing values except one column 'Age'
. So that I have a df with 2 columns 'Name' and 'Age'. How can I do it ?
Upvotes: 0
Views: 1114
Reputation: 1076
This should do what you need:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Age' : [5,np.nan,12,43],
'Name' : ['Alice','Bob','Charly','Dan'],
'Sex' : ['F','M','M',np.nan]})
df_filt = df.loc[:,(-df.isnull().any()) | (df.columns.isin(['Age']))]
Explanation:
df.isnull().any())
checks for all columns if any value is None
or NaN
, the -
means that only those columns are selected that do not meet that criterion.
df.columns.isin(['Age'])
checks for all columns if their name is 'Age', so that this column is selected in any case.
Both conditions are connected by an OR (|
) so that if either condition applies the column is selected.
Upvotes: 3